DESCRIPCION DEL PROBLEMA

HIPOTESIS

CARGA Y DEPURACION DE DATOS

Carga de Datos

Datos de 2014

Durante la carga de datos se realizo la verificacion de las clases de accidentes e inicialmente se identificaron 6 categorias: 1. Atropello 2. Caida_Ocupante 3. Choque 4. Incendio 5. Otro 6. Volcamiento

Sin embargo entre 2014 y 2017 solo se presentaron 17 casos de incendio y en el ano 2018, 7 casos. Dada la poca cantidad de eventos se dejaron finalmente 5 categorias dado que incendio se incorporo a otro:

  1. Atropello
  2. Caida_Ocupante
  3. Choque
  4. Otro
  5. Volcamiento
raw_data_2014 <- read.csv(file="./data/Accidentalidad_georreferenciada_2014.csv", encoding="UTF-8", header=TRUE, sep=",")
head(raw_data_2014)
##   X.U.FEFF.OBJECTID        X       Y RADICADO                    FECHA
## 1            211279 831190.1 1179690  1423828 2014-01-01T00:00:00.000Z
## 2            211280 835013.6 1184224  1423839 2014-01-01T00:00:00.000Z
## 3            211281 837032.7 1184750  1423840 2014-01-01T00:00:00.000Z
## 4            211282 830649.5 1181383  1423849 2014-01-01T00:00:00.000Z
## 5            211283 833740.8 1188644  1423890 2014-01-01T00:00:00.000Z
## 6            211284 836425.4 1186933  1423892 2014-01-01T00:00:00.000Z
##       HORA DIA PERIODO             CLASE    DIRECCION
## 1 02:20 AM   1    2014            Choque   CR 80 CL 8
## 2 12:50 AM   1    2014         Atropello  CR 53 CL 61
## 3 01:00 AM   1    2014         Atropello  CR 39 CL 70
## 4 12:37 AM   1    2014         Atropello  CL 32 CR 84
## 5 10:40 AM   1    2014 Caída de Ocupante CR 80 CL 101
## 6 04:00 AM   1    2014            Choque  CR 48 CL 93
##              DIRECCION_ENC CBML                    TIPO_GEOCOD GRAVEDAD
## 1 CR  080   008  000 00000 1611 Malla vial aproximada: CR 81-7   HERIDO
## 2 CR  053   061  000 00000 1003                     Malla vial   HERIDO
## 3 CR  039   070  000 00000 0308                     Malla vial   HERIDO
## 4 CL  032   084  000 00000 1617                     Malla vial   HERIDO
## 5 CR  080   101  000 00000 0603                     Malla vial   HERIDO
## 6 CR  048   093  000 00000 0401                     Malla vial   HERIDO
##                 BARRIO          COMUNA       DISENO DIA_NOMBRE MES
## 1   Loma de los Bernal           Belén Tramo de via  MIÉRCOLES   1
## 2       Jesús Nazareno   La Candelaria Tramo de via  MIÉRCOLES   1
## 3    Manrique Oriental        Manrique Tramo de via  MIÉRCOLES   1
## 4         Las Mercedes           Belén Tramo de via  MIÉRCOLES   1
## 5 Doce de Octubre No.2 Doce de Octubre Tramo de via  MIÉRCOLES   1
## 6               Berlin        Aranjuez Interseccion  MIÉRCOLES   1
summary(raw_data_2014)
##  X.U.FEFF.OBJECTID       X                Y              RADICADO        
##  Min.   :211279    Min.   :823074   Min.   :1172279   Min.   :3.800e+01  
##  1st Qu.:222939    1st Qu.:833036   1st Qu.:1181405   1st Qu.:1.436e+06  
##  Median :234284    Median :834331   Median :1183250   Median :1.447e+06  
##  Mean   :234276    Mean   :834150   Mean   :1183303   Mean   :4.810e+15  
##  3rd Qu.:245700    3rd Qu.:835417   3rd Qu.:1185597   3rd Qu.:1.459e+06  
##  Max.   :257033    Max.   :845503   Max.   :1189981   Max.   :5.002e+19  
##                                                                          
##                       FECHA             HORA            DIA       
##  2014-06-03T00:00:00.000Z:  168   04:00 PM:  776   Min.   : 1.00  
##  2014-02-13T00:00:00.000Z:  163   06:00 PM:  768   1st Qu.: 8.00  
##  2014-08-06T00:00:00.000Z:  163   05:00 PM:  766   Median :16.00  
##  2014-08-08T00:00:00.000Z:  158   02:00 PM:  705   Mean   :15.73  
##  2014-05-19T00:00:00.000Z:  157   03:00 PM:  695   3rd Qu.:23.00  
##  2014-07-07T00:00:00.000Z:  157   01:00 PM:  679   Max.   :31.00  
##  (Other)                 :40628   (Other) :37205                  
##     PERIODO                   CLASE               DIRECCION    
##  Min.   :2014   Atropello        : 4779   CR 64 C CL 78:  230  
##  1st Qu.:2014   Caída de Ocupante: 4157   CR 1 CL 1    :  165  
##  Median :2014   Choque           :27157   CR 64 C CL 67:  151  
##  Mean   :2014   Incendio         :    8   CR 57 CL 44  :  118  
##  3rd Qu.:2014   Otro             : 4521   CR 80 CL 50  :  115  
##  Max.   :2014   Volcamiento      :  972   CR 80 CL 65  :  115  
##                                           (Other)      :40700  
##                     DIRECCION_ENC        CBML      
##  CR  064 C   078  000 00000:  248   1019   : 1148  
##  CR  001   001  000 00000  :  165   0517   :  979  
##  CR  064 C   067  000 00000:  151   1507   :  783  
##  CR  057   044  000 00000  :  118   1012   :  774  
##  CR  080   065  000 00000  :  116   1007   :  733  
##  CR  080   050  000 00000  :  115   1105   :  733  
##  (Other)                   :40681   (Other):36444  
##                         TIPO_GEOCOD          GRAVEDAD    
##  Malla vial                   :30320   HERIDO    :23077  
##  Malla vial cruce invertido   : 2251   MUERTO    :  256  
##  EPM sin Interior             :  987   SOLO DAÑOS:18261  
##  EPM con Interior             :  435                     
##  Malla vial aproximada: CR 1-2:  167                     
##  ZONA RURAL                   :   97                     
##  (Other)                      : 7337                     
##                 BARRIO                   COMUNA                DISENO     
##  La Candelaria     : 1151   La Candelaria   : 8760   Tramo de via :33745  
##  Caribe            :  980   Laureles Estadio: 4678   Interseccion : 5305  
##  Campo Amor        :  786   Castilla        : 4204   Lote o Predio: 1083  
##  Perpetuo Socorro  :  775   El Poblado      : 3154   Glorieta     :  702  
##  Los Conquistadores:  735   Guayabal        : 2946                :  256  
##  Guayaquil         :  733   Robledo         : 2828   Ciclo Ruta   :  192  
##  (Other)           :36434   (Other)         :15024   (Other)      :  311  
##      DIA_NOMBRE        MES        
##  DOMINGO  :4209   Min.   : 1.000  
##  JUEVES   :6143   1st Qu.: 4.000  
##  LUNES    :6008   Median : 7.000  
##  MARTES   :6301   Mean   : 6.543  
##  MIÉRCOLES:6420   3rd Qu.: 9.000  
##  SÁBADO   :5984   Max.   :12.000  
##  VIERNES  :6529
str(raw_data_2014)
## 'data.frame':    41594 obs. of  19 variables:
##  $ X.U.FEFF.OBJECTID: int  211279 211280 211281 211282 211283 211284 211285 211286 211287 211288 ...
##  $ X                : num  831190 835014 837033 830649 833741 ...
##  $ Y                : num  1179690 1184224 1184750 1181383 1188644 ...
##  $ RADICADO         : num  1423828 1423839 1423840 1423849 1423890 ...
##  $ FECHA            : Factor w/ 365 levels "2014-01-01T00:00:00.000Z",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ HORA             : Factor w/ 967 levels "01:00 AM","01:00 PM",..: 95 954 1 938 777 201 469 268 37 72 ...
##  $ DIA              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PERIODO          : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
##  $ CLASE            : Factor w/ 6 levels "Atropello","Caída de Ocupante",..: 3 1 1 1 2 3 3 3 3 2 ...
##  $ DIRECCION        : Factor w/ 12673 levels "CL 1 A CR 42",..: 11441 8871 6987 1424 11297 8063 3081 6269 11142 8131 ...
##  $ DIRECCION_ENC    : Factor w/ 12510 levels "000","CL     029  000 00000",..: 11024 8642 6786 915 11166 7817 2469 6072 10863 7898 ...
##  $ CBML             : Factor w/ 391 levels "","0101","01010480009",..: 333 192 39 340 105 48 170 176 241 280 ...
##  $ TIPO_GEOCOD      : Factor w/ 2252 levels "Catastro con Interior",..: 2064 5 5 5 5 5 5 5 5 5 ...
##  $ GRAVEDAD         : Factor w/ 3 levels "HERIDO","MUERTO",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BARRIO           : Factor w/ 313 levels "","6001","Aguas Frias",..: 181 129 201 175 81 40 31 155 294 302 ...
##  $ COMUNA           : Factor w/ 24 levels "","Aranjuez",..: 4 16 18 4 11 2 5 5 17 12 ...
##  $ DISENO           : Factor w/ 13 levels "","Ciclo Ruta",..: 11 11 11 11 11 4 11 4 11 11 ...
##  $ DIA_NOMBRE       : Factor w/ 7 levels "DOMINGO  ","JUEVES   ",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ MES              : int  1 1 1 1 1 1 1 1 1 1 ...
levels(raw_data_2014$CLASE)
## [1] "Atropello"         "Caída de Ocupante" "Choque"           
## [4] "Incendio"          "Otro"              "Volcamiento"
levels(raw_data_2014$CLASE) <- c("Atropello","Caida_Ocupante","Choque","Otro","Otro",
                                 "Volcamiento")
levels(raw_data_2014$CLASE)
## [1] "Atropello"      "Caida_Ocupante" "Choque"         "Otro"          
## [5] "Volcamiento"

Datos de 2015

raw_data_2015 <- read.csv(file="./data/Accidentalidad_georreferenciada_2015.csv", encoding="UTF-8", header=TRUE, sep=",")
head(raw_data_2015)
##   X.U.FEFF.OBJECTID        X       Y RADICADO                    FECHA
## 1              1001 835766.9 1183119  1471803 2015-01-13T00:00:00.000Z
## 2              1002 839496.9 1177763  1471804 2015-01-13T00:00:00.000Z
## 3              1003 832970.8 1178867  1471805 2015-01-13T00:00:00.000Z
## 4              1004 831318.1 1187204  1471806 2015-01-13T00:00:00.000Z
## 5              1005 833710.7 1185631  1471807 2015-01-13T00:00:00.000Z
## 6              1006 839496.9 1177763  1471808 2015-01-12T00:00:00.000Z
##       HORA DIA PERIODO       CLASE       DIRECCION
## 1 02:20 PM  13    2015 Volcamiento     CR 43 CL 54
## 2 09:00 AM  13    2015        Otro       CR 1 CL 1
## 3 10:00 AM  13    2015 Volcamiento    CR 58 D CL 3
## 4 06:30 PM  13    2015        Otro CL 68 B CR 96 E
## 5 05:40 AM  13    2015        Otro     CR 68 CL 70
## 6 09:30 AM  12    2015 Volcamiento       CR 1 CL 1
##                  DIRECCION_ENC CBML                     TIPO_GEOCOD
## 1     CR  043   054  000 00000 1016                      Malla vial
## 2     CR  001   001  000 00000 9086   Malla vial aproximada: CR 1-2
## 3   CR  058 D   003  000 00000 1507  Malla vial aproximada: CR 58-4
## 4 CL  068 B   096 E  000 00000 0724 Malla vial aproximada: CL 68-95
## 5     CR  068   070  000 00000 0519 Malla vial aproximada: CR 69-71
## 6     CR  001   001  000 00000 9086   Malla vial aproximada: CR 1-2
##   GRAVEDAD            BARRIO                       COMUNA       DISENO
## 1   HERIDO            Boston                La Candelaria Tramo de via
## 2   HERIDO Suburbano El Plan Corregimiento de Santa Elena Tramo de via
## 3   HERIDO        Campo Amor                     Guayabal Tramo de via
## 4   HERIDO        Monteclaro                      Robledo Tramo de via
## 5   HERIDO       El Progreso                     Castilla Tramo de via
## 6   HERIDO Suburbano El Plan Corregimiento de Santa Elena Tramo de via
##   DIA_NOMBRE MES
## 1  MARTES      1
## 2  MARTES      1
## 3  MARTES      1
## 4  MARTES      1
## 5  MARTES      1
## 6  LUNES       1
summary(raw_data_2015)
##  X.U.FEFF.OBJECTID       X                Y              RADICADO        
##  Min.   :    1     Min.   :821476   Min.   :1172359   Min.   :3.270e+02  
##  1st Qu.:11558     1st Qu.:833029   1st Qu.:1181443   1st Qu.:1.482e+06  
##  Median :23149     Median :834305   Median :1183204   Median :1.494e+06  
##  Mean   :23110     Mean   :834102   Mean   :1183264   Mean   :2.175e+16  
##  3rd Qu.:34664     3rd Qu.:835387   3rd Qu.:1185476   3rd Qu.:1.506e+06  
##  Max.   :46172     Max.   :842417   Max.   :1193451   Max.   :5.002e+20  
##                                                                          
##                       FECHA             HORA            DIA       
##  2015-08-18T00:00:00.000Z:  183   07:00 AM:  951   Min.   : 1.00  
##  2015-08-28T00:00:00.000Z:  183   08:00 AM:  921   1st Qu.: 8.00  
##  2015-09-11T00:00:00.000Z:  162   10:00 AM:  801   Median :16.00  
##  2015-12-07T00:00:00.000Z:  161   07:30 AM:  767   Mean   :15.74  
##  2015-09-17T00:00:00.000Z:  160   06:30 AM:  756   3rd Qu.:23.00  
##  2015-09-18T00:00:00.000Z:  160   06:00 AM:  741   Max.   :31.00  
##  (Other)                 :41071   (Other) :37143                  
##     PERIODO                   CLASE               DIRECCION    
##  Min.   :2015   Choque           :28249   CR 64 C CL 78:  211  
##  1st Qu.:2015   Atropello        : 4485   CR 63 CL 44  :  177  
##  Median :2015   Otro             : 4218   CR 57 CL 44  :  172  
##  Mean   :2015   Caida Ocupante   : 3673   CR 64 C CL 67:  132  
##  3rd Qu.:2015   Volcamiento      : 1435   CR 80 CL 50  :  122  
##  Max.   :2015   Caída de Ocupante:   18   CR 52 CL 10  :  121  
##                 (Other)          :    2   (Other)      :41145  
##                     DIRECCION_ENC        CBML      
##  CR  064 C   078  000 00000:  227   1019   : 1067  
##  CR  063   044  000 00000  :  177   0517   :  899  
##  CR  057   044  000 00000  :  173   1007   :  882  
##  CR  064 C   067  000 00000:  132   1006   :  867  
##  CR  080   050  000 00000  :  122   1012   :  804  
##  CR  052   010  000 00000  :  121   1105   :  783  
##  (Other)                   :41128   (Other):36778  
##                           TIPO_GEOCOD          GRAVEDAD    
##  Malla vial                     :31011   HERIDO    :23273  
##  Malla vial cruce invertido     : 2341   MUERTO    :  250  
##  EPM sin Interior               :  886   SOLO DAÑOS:18557  
##  EPM con Interior               :  459                     
##  Malla vial aproximada: CR 1-2  :   86                     
##  Malla vial aproximada: CR 65-80:   85                     
##  (Other)                        : 7212                     
##                 BARRIO                   COMUNA                DISENO     
##  La Candelaria     : 1069   La Candelaria   : 9355   Tramo de via :33729  
##  Caribe            :  899   Laureles Estadio: 4908   Interseccion : 5794  
##  Guayaquil         :  883   Castilla        : 4000   Lote o Predio: 1219  
##  San Benito        :  868   El Poblado      : 3194   Glorieta     :  751  
##  Perpetuo Socorro  :  804   Robledo         : 2884                :  250  
##  Los Conquistadores:  785   Guayabal        : 2814   Paso Elevado :  101  
##  (Other)           :36772   (Other)         :14925   (Other)      :  236  
##      DIA_NOMBRE        MES        
##  DOMINGO  :4048   Min.   : 1.000  
##  JUEVES   :6516   1st Qu.: 4.000  
##  LUNES    :5973   Median : 7.000  
##  MARTES   :6696   Mean   : 6.624  
##  MIÉRCOLES:6341   3rd Qu.:10.000  
##  SÁBADO   :6003   Max.   :12.000  
##  VIERNES  :6503
str(raw_data_2015)
## 'data.frame':    42080 obs. of  19 variables:
##  $ X.U.FEFF.OBJECTID: int  1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 ...
##  $ X                : num  835767 839497 832971 831318 833711 ...
##  $ Y                : num  1183119 1177763 1178867 1187204 1185631 ...
##  $ RADICADO         : num  1471803 1471804 1471805 1471806 1471807 ...
##  $ FECHA            : Factor w/ 365 levels "2015-01-01T00:00:00.000Z",..: 13 13 13 13 13 12 13 13 13 13 ...
##  $ HORA             : Factor w/ 898 levels "01:00 AM","01:00 PM",..: 103 610 691 413 342 651 560 493 342 868 ...
##  $ DIA              : int  13 13 13 13 13 12 13 13 13 13 ...
##  $ PERIODO          : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ CLASE            : Factor w/ 8 levels "","Atropello",..: 8 7 8 7 7 8 7 7 7 4 ...
##  $ DIRECCION        : Factor w/ 12729 levels "CL 1 A CR 35",..: 7461 5630 9229 4297 10332 5630 9640 10693 9766 9166 ...
##  $ DIRECCION_ENC    : Factor w/ 12523 levels "000","CL     010  000 00000",..: 7057 5546 8947 3700 9999 5546 9515 10334 9266 8875 ...
##  $ CBML             : Factor w/ 394 levels "","0","0101",..: 217 389 317 150 106 389 85 133 232 200 ...
##  $ TIPO_GEOCOD      : Factor w/ 2229 levels "Catastro con Interior",..: 5 1014 1645 797 1844 1014 1728 1892 5 5 ...
##  $ GRAVEDAD         : Factor w/ 3 levels "HERIDO","MUERTO",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BARRIO           : Factor w/ 319 levels "","0","6001",..: 51 283 62 210 99 283 170 76 194 248 ...
##  $ COMUNA           : Factor w/ 26 levels "","0","Aranjuez",..: 18 12 15 22 7 12 7 22 19 18 ...
##  $ DISENO           : Factor w/ 13 levels "","Ciclo Ruta",..: 11 11 11 11 11 11 11 11 11 11 ...
##  $ DIA_NOMBRE       : Factor w/ 7 levels "DOMINGO  ","JUEVES   ",..: 4 4 4 4 4 3 4 4 4 4 ...
##  $ MES              : int  1 1 1 1 1 1 1 1 1 1 ...
levels(raw_data_2015$CLASE)
## [1] ""                  "Atropello"         "Caída de Ocupante"
## [4] "Caida Ocupante"    "Choque"            "Incendio"         
## [7] "Otro"              "Volcamiento"
levels(raw_data_2015$CLASE) <- c("Otro","Atropello","Caida_Ocupante","Caida_Ocupante","Choque",
                                 "Otro","Otro","Volcamiento")
levels(raw_data_2015$CLASE)
## [1] "Otro"           "Atropello"      "Caida_Ocupante" "Choque"        
## [5] "Volcamiento"

Datos 2016

raw_data_2016 <- read.csv(file="./data/Accidentalidad_georreferenciada_2016.csv", encoding="UTF-8", header=TRUE, sep=",")
head(raw_data_2016)
##   X.U.FEFF.OBJECTID        X       Y RADICADO                    FECHA
## 1            259034 827277.9 1175177  1519832 2016-01-20T00:00:00.000Z
## 2            259035 835105.9 1183570  1519765 2016-01-20T00:00:00.000Z
## 3            259036 834811.1 1182025  1519752 2016-01-20T00:00:00.000Z
## 4            259037 836020.8 1178803  1519682 2016-01-20T00:00:00.000Z
## 5            259038 832264.6 1185704  1519685 2016-01-20T00:00:00.000Z
## 6            259040 831843.3 1183130  1519883 2016-01-20T00:00:00.000Z
##       HORA DIA PERIODO       CLASE         DIRECCION
## 1 11:00 AM  20    2016      Choque CR 55 CL 48 A Sur
## 2 01:25 PM  20    2016      Choque       CL 54 CR 51
## 3 02:35 PM  20    2016      Choque       CR 46 CL 40
## 4 08:40 AM  20    2016      Choque     CR 30 CL 10 C
## 5 10:30 AM  20    2016 Volcamiento       CR 80 CL 64
## 6 10:45 AM  20    2016      Choque       CR 78 CL 44
##                 DIRECCION_ENC CBML                     TIPO_GEOCOD
## 1 CR  055  S 048 A  000 00000 8000                      Malla vial
## 2    CL  054   051  000 00000 1005                      Malla vial
## 3    CR  046   040  000 00000 1013                      Malla vial
## 4  CR  030   010 C  000 00000 1407                      Malla vial
## 5    CR  080   064  000 00000 0705 Malla vial aproximada: CR 80-65
## 6    CR  078   044  000 00000 1112                      Malla vial
##     GRAVEDAD                        BARRIO
## 1 SOLO DAÑOS Cabecera San Antonio de Prado
## 2 SOLO DAÑOS                Estación Villa
## 3 SOLO DAÑOS                  Barrio Colón
## 4 SOLO DAÑOS                Las Lomas No.2
## 5 SOLO DAÑOS Facultad de Minas U. Nacional
## 6 SOLO DAÑOS                  El Velódromo
##                                  COMUNA        DISENO DIA_NOMBRE MES
## 1 Corregimiento de San Antonio de Prado Lote o Predio  MIÉRCOLES   1
## 2                         La Candelaria  Tramo de via  MIÉRCOLES   1
## 3                         La Candelaria  Tramo de via  MIÉRCOLES   1
## 4                            El Poblado  Tramo de via  MIÉRCOLES   1
## 5                               Robledo  Tramo de via  MIÉRCOLES   1
## 6                      Laureles Estadio  Interseccion  MIÉRCOLES   1
str(raw_data_2016)
## 'data.frame':    42841 obs. of  19 variables:
##  $ X.U.FEFF.OBJECTID: int  259034 259035 259036 259037 259038 259040 259041 259042 259044 259045 ...
##  $ X                : num  827278 835106 834811 836021 832265 ...
##  $ Y                : num  1175177 1183570 1182025 1178803 1185704 ...
##  $ RADICADO         : num  1519832 1519765 1519752 1519682 1519685 ...
##  $ FECHA            : Factor w/ 366 levels "2016-01-01T00:00:00.000Z",..: 20 20 20 20 20 20 20 20 20 20 ...
##  $ HORA             : Factor w/ 1092 levels "01:00 AM","01:00 PM",..: 918 36 128 696 878 896 778 81 42 452 ...
##  $ DIA              : int  20 20 20 20 20 20 20 20 20 20 ...
##  $ PERIODO          : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ CLASE            : Factor w/ 8 levels "","Atropello",..: 5 5 5 5 8 5 5 5 5 5 ...
##  $ DIRECCION        : Factor w/ 12962 levels " CR 59 CL 64 -32",..: 9087 3363 7866 6408 11595 11298 1313 78 1793 5222 ...
##  $ DIRECCION_ENC    : Factor w/ 12631 levels "000","CL     053  000 00000",..: 8828 2700 7540 6119 11176 10918 759 284 1275 201 ...
##  $ CBML             : Factor w/ 396 levels "","0","0101",..: 382 197 210 298 122 246 332 298 207 310 ...
##  $ TIPO_GEOCOD      : Factor w/ 2254 levels "Catastro con Interior",..: 5 5 5 5 2013 5 5 2252 5 5 ...
##  $ GRAVEDAD         : Factor w/ 3 levels "HERIDO","MUERTO",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ BARRIO           : Factor w/ 313 levels "","0","6001",..: 56 111 33 175 112 109 242 175 60 97 ...
##  $ COMUNA           : Factor w/ 25 levels "","0","Aranjuez",..: 9 18 18 14 22 19 5 14 18 14 ...
##  $ DISENO           : Factor w/ 13 levels "","Ciclo Ruta",..: 5 11 11 11 11 4 11 4 4 11 ...
##  $ DIA_NOMBRE       : Factor w/ 7 levels "DOMINGO  ","JUEVES   ",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ MES              : int  1 1 1 1 1 1 1 1 1 1 ...
levels(raw_data_2016$CLASE)
## [1] ""                  "Atropello"         "Caída de Ocupante"
## [4] "Caida Ocupante"    "Choque"            "Incendio"         
## [7] "Otro"              "Volcamiento"
levels(raw_data_2016$CLASE) <- c("Otro","Atropello","Caida_Ocupante","Caida_Ocupante","Choque",
                                 "Otro","Otro","Volcamiento")
levels(raw_data_2016$CLASE)
## [1] "Otro"           "Atropello"      "Caida_Ocupante" "Choque"        
## [5] "Volcamiento"

Datos 2017

raw_data_2017 <- read.csv(file="./data/Accidentalidad_georreferenciada_2017.csv", encoding="UTF-8", header=TRUE, sep=",")
head(raw_data_2017)
##   X.U.FEFF.OBJECTID        X       Y RADICADO                    FECHA
## 1            504352 834796.6 1181345  1590356 2017-07-20T00:00:00.000Z
## 2            504353 831461.0 1181883  1586285 2017-06-15T00:00:00.000Z
## 3            504354 833229.3 1186996  1588185 2017-07-02T00:00:00.000Z
## 4            504355 833366.8 1181955  1576853 2017-03-29T00:00:00.000Z
## 5            504356 832577.6 1184274  1591283 2017-07-26T00:00:00.000Z
## 6            504358 834246.7 1180257  1578106 2017-04-12T00:00:00.000Z
##       HORA DIA PERIODO          CLASE         DIRECCION
## 1 08:00 AM  20    2017 Caida Ocupante     CR 43 A CL 33
## 2 04:50 PM  15    2017         Choque       CR 80 CL 33
## 3 04:20 PM   2    2017         Choque     CR 80 CL 80 A
## 4 06:29 PM  29    2017         Choque CL 33 Norte CR 65
## 5 08:10 PM  26    2017      Atropello       CL 50 CR 74
## 6 11:10 AM  12    2017         Choque       CR 48 CL 20
##                DIRECCION_ENC CBML TIPO_GEOCOD   GRAVEDAD
## 1 CR  043 A   033  000 00000 1020  Malla vial     HERIDO
## 2   CR  080   033  000 00000 1109  Malla vial SOLO DAÑOS
## 3 CR  080   080 A  000 00000 0710  Malla vial     HERIDO
## 4   CL  033   065  000 00000 1105  Malla vial SOLO DAÑOS
## 5   CL  050   074  000 00000 1115  Malla vial     HERIDO
## 6   CR  048   020  000 00000 1403  Malla vial SOLO DAÑOS
##               BARRIO           COMUNA       DISENO DIA_NOMBRE MES
## 1          San Diego    La Candelaria Interseccion  JUEVES      7
## 2        Las Acacias Laureles Estadio     Glorieta  JUEVES      6
## 3      López de Mesa          Robledo Tramo de via  DOMINGO     7
## 4 Los Conquistadores Laureles Estadio Tramo de via  MIÉRCOLES   3
## 5     Cuarta Brigada Laureles Estadio Tramo de via  MIÉRCOLES   7
## 6      Villa Carlota       El Poblado Tramo de via  MIÉRCOLES   4
str(raw_data_2017)
## 'data.frame':    42563 obs. of  19 variables:
##  $ X.U.FEFF.OBJECTID: int  504352 504353 504354 504355 504356 504358 504359 504360 504361 504362 ...
##  $ X                : num  834797 831461 833229 833367 832578 ...
##  $ Y                : num  1181345 1181883 1186996 1181955 1184274 ...
##  $ RADICADO         : num  1590356 1586285 1588185 1576853 1591283 ...
##  $ FECHA            : Factor w/ 370 levels "2017-01-01T00:00:00.000Z",..: 201 166 183 88 207 102 154 110 180 174 ...
##  $ HORA             : Factor w/ 1106 levels "01:00 AM","01:00 PM",..: 638 321 278 473 657 950 291 362 203 555 ...
##  $ DIA              : int  20 15 2 29 26 12 3 20 29 23 ...
##  $ PERIODO          : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
##  $ CLASE            : Factor w/ 9 levels "Atropello","Caida Ocupante",..: 2 4 4 4 1 4 2 4 4 4 ...
##  $ DIRECCION        : Factor w/ 12879 levels "CL 1 A CR 70",..: 7319 11494 11596 1612 3122 7985 5410 3656 2392 652 ...
##  $ DIRECCION_ENC    : Factor w/ 12595 levels "000","CL     018  000 00000",..: 7116 11091 11186 1035 2406 7639 4443 2902 1738 5432 ...
##  $ CBML             : Factor w/ 403 levels "","0101","0102",..: 217 230 131 224 239 283 112 212 249 303 ...
##  $ TIPO_GEOCOD      : Factor w/ 2569 levels "Catastro con Interior",..: 5 5 5 5 5 5 5 5 5 4 ...
##  $ GRAVEDAD         : Factor w/ 3 levels "HERIDO","MUERTO",..: 1 3 1 3 1 3 1 1 3 3 ...
##  $ BARRIO           : Factor w/ 325 levels "","6001","9086",..: 252 169 185 195 78 314 134 322 89 135 ...
##  $ COMUNA           : Factor w/ 83 levels "","Alejandro Echavarría",..: 46 55 68 55 55 33 30 46 45 33 ...
##  $ DISENO           : Factor w/ 13 levels "","Ciclo Ruta",..: 4 3 11 11 11 11 5 11 4 4 ...
##  $ DIA_NOMBRE       : Factor w/ 7 levels "DOMINGO  ","JUEVES   ",..: 2 2 1 5 5 5 6 2 2 7 ...
##  $ MES              : int  7 6 7 3 7 4 6 4 6 6 ...
levels(raw_data_2017$CLASE)
## [1] "Atropello"          "Caida Ocupante"     "Caída Ocupante"    
## [4] "Choque"             "Choque "            "Choque y Atropello"
## [7] "Incendio"           "Otro"               "Volcamiento"
levels(raw_data_2017$CLASE) <- c("Atropello","Caida_Ocupante","Caida_Ocupante","Choque","Choque",
                                 "Choque","Otro","Otro","Volcamiento")
levels(raw_data_2017$CLASE)
## [1] "Atropello"      "Caida_Ocupante" "Choque"         "Otro"          
## [5] "Volcamiento"
summary(raw_data_2014$CLASE)
##      Atropello Caida_Ocupante         Choque           Otro    Volcamiento 
##           4779           4157          27157           4529            972
summary(raw_data_2015$CLASE)
##           Otro      Atropello Caida_Ocupante         Choque    Volcamiento 
##           4220           4485           3691          28249           1435
summary(raw_data_2016$CLASE)
##           Otro      Atropello Caida_Ocupante         Choque    Volcamiento 
##           4879           4167           3680          28631           1484
summary(raw_data_2017$CLASE)
##      Atropello Caida_Ocupante         Choque           Otro    Volcamiento 
##           3640           3433          29196           4722           1572
raw_data_2018 <- read.csv(file="./data/Accidentalidad_georreferenciada_2018.csv", encoding="UTF-8", header=TRUE, sep=",")
str(raw_data_2018)
## 'data.frame':    40348 obs. of  19 variables:
##  $ X.U.FEFF.OBJECTID: int  550556 550557 550558 550559 550560 550562 550563 550564 550565 550566 ...
##  $ X                : num  833196 833455 835882 831732 835720 ...
##  $ Y                : num  1184350 1187884 1183347 1180062 1181651 ...
##  $ RADICADO         : int  1612819 1612866 1612809 1612812 1612817 1612856 1612823 1612825 1612850 1612815 ...
##  $ FECHA            : Factor w/ 365 levels "2018-01-01T00:00:00.000Z",..: 14 14 14 14 14 14 14 14 14 14 ...
##  $ HORA             : Factor w/ 1485 levels "01:00 AM","01:00 PM",..: 57 57 168 168 182 222 281 334 348 372 ...
##  $ DIA              : int  14 14 14 14 14 14 14 14 14 14 ...
##  $ PERIODO          : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
##  $ CLASE            : Factor w/ 7 levels "Atropello","Caida Ocupante",..: 4 4 2 6 4 1 4 4 4 2 ...
##  $ DIRECCION        : Factor w/ 12504 levels "CL 1 A CR 25",..: 10404 11159 7325 10853 6620 7455 5569 1588 2214 10721 ...
##  $ DIRECCION_ENC    : Factor w/ 12364 levels "000","CL  001   017  000 00000",..: 10180 10919 6991 10609 6409 7256 5538 1097 1679 10470 ...
##  $ CBML             : Factor w/ 375 levels "","0101","0102",..: 207 103 202 307 172 40 214 194 231 104 ...
##  $ TIPO_GEOCOD      : Factor w/ 2405 levels "Catastro con Interior",..: 5 5 5 5 5 3 5 5 5 5 ...
##  $ GRAVEDAD         : Factor w/ 3 levels "HERIDO","MUERTO",..: 1 3 1 1 3 1 3 3 3 1 ...
##  $ BARRIO           : Factor w/ 321 levels "","0","6001",..: 69 262 192 251 108 205 48 237 200 137 ...
##  $ COMUNA           : Factor w/ 24 levels "","Aranjuez",..: 18 12 17 4 5 19 18 17 16 12 ...
##  $ DISENO           : Factor w/ 13 levels "","Ciclo Ruta",..: 4 11 11 4 11 11 4 5 11 4 ...
##  $ DIA_NOMBRE       : Factor w/ 7 levels "DOMINGO  ","JUEVES   ",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MES              : int  1 1 1 1 1 1 1 1 1 1 ...
levels(raw_data_2018$CLASE)
## [1] "Atropello"      "Caida Ocupante" "Caída Ocupante" "Choque"        
## [5] "Incendio"       "Otro"           "Volcamiento"
levels(raw_data_2018$CLASE) <- c("Atropello","Caida_Ocupante","Caida_Ocupante","Choque","Otro","Otro", "Volcamiento")
levels(raw_data_2018$CLASE)
## [1] "Atropello"      "Caida_Ocupante" "Choque"         "Otro"          
## [5] "Volcamiento"
summary(raw_data_2018$CLASE)
##      Atropello Caida_Ocupante         Choque           Otro    Volcamiento 
##           3604           3617          28207           3746           1174

Union de los datos

Total_Dataset <- rbind(raw_data_2014,raw_data_2015,raw_data_2016, raw_data_2017, raw_data_2018)
str(Total_Dataset)
## 'data.frame':    209426 obs. of  19 variables:
##  $ X.U.FEFF.OBJECTID: int  211279 211280 211281 211282 211283 211284 211285 211286 211287 211288 ...
##  $ X                : num  831190 835014 837033 830649 833741 ...
##  $ Y                : num  1179690 1184224 1184750 1181383 1188644 ...
##  $ RADICADO         : num  1423828 1423839 1423840 1423849 1423890 ...
##  $ FECHA            : Factor w/ 1831 levels "2014-01-01T00:00:00.000Z",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ HORA             : Factor w/ 2239 levels "01:00 AM","01:00 PM",..: 95 954 1 938 777 201 469 268 37 72 ...
##  $ DIA              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PERIODO          : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
##  $ CLASE            : Factor w/ 5 levels "Atropello","Caida_Ocupante",..: 3 1 1 1 2 3 3 3 3 2 ...
##  $ DIRECCION        : Factor w/ 29182 levels "CL 1 A CR 42",..: 11441 8871 6987 1424 11297 8063 3081 6269 11142 8131 ...
##  $ DIRECCION_ENC    : Factor w/ 27999 levels "000","CL     029  000 00000",..: 11024 8642 6786 915 11166 7817 2469 6072 10863 7898 ...
##  $ CBML             : Factor w/ 699 levels "","0101","01010480009",..: 333 192 39 340 105 48 170 176 241 280 ...
##  $ TIPO_GEOCOD      : Factor w/ 5624 levels "Catastro con Interior",..: 2064 5 5 5 5 5 5 5 5 5 ...
##  $ GRAVEDAD         : Factor w/ 3 levels "HERIDO","MUERTO",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BARRIO           : Factor w/ 340 levels "","6001","Aguas Frias",..: 181 129 201 175 81 40 31 155 294 302 ...
##  $ COMUNA           : Factor w/ 85 levels "","Aranjuez",..: 4 16 18 4 11 2 5 5 17 12 ...
##  $ DISENO           : Factor w/ 13 levels "","Ciclo Ruta",..: 11 11 11 11 11 4 11 4 11 11 ...
##  $ DIA_NOMBRE       : Factor w/ 7 levels "DOMINGO  ","JUEVES   ",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ MES              : int  1 1 1 1 1 1 1 1 1 1 ...
levels(Total_Dataset$CLASE)
## [1] "Atropello"      "Caida_Ocupante" "Choque"         "Otro"          
## [5] "Volcamiento"
summary(Total_Dataset$CLASE)
##      Atropello Caida_Ocupante         Choque           Otro    Volcamiento 
##          20675          18578         141440          22096           6637

Para ahorra memoria se eliminan los dataframes por ano

rm(raw_data_2014,raw_data_2015,raw_data_2016, raw_data_2017, raw_data_2018)

La Fecha esta cargada como factor, se realiza una transformacion a formato de fecha (date)

Total_Dataset$FECHA <- as.Date(Total_Dataset$FECHA, format="%Y-%m-%d")
tail(Total_Dataset)
##        X.U.FEFF.OBJECTID        X       Y RADICADO      FECHA
## 209421            686550 836789.8 1186516  1652251 2018-12-05
## 209422            686551 834601.3 1186742  1652236 2018-12-05
## 209423            686552 832577.7 1183055  1652193 2018-12-05
## 209424            686553 833974.2 1183603  1652324 2018-12-05
## 209425            686554 834299.9 1180877  1652349 2018-12-05
## 209426            686555 836838.6 1186467  1652347 2018-12-05
##                  HORA DIA PERIODO  CLASE     DIRECCION
## 209421 03:20:00 p. m.   5    2018 Choque CR 44 A CL 88
## 209422 03:20:00 p. m.   5    2018 Choque CR 65 CL 91 C
## 209423 03:20:00 p. m.   5    2018 Choque   CL 44 CR 71
## 209424 03:30:00 p. m.   5    2018 Choque   CR 63 CL 50
## 209425 03:40:00 p. m.   5    2018 Choque   CL 27 CR 46
## 209426 04:10:00 p. m.   5    2018   Otro   CR 44 CL 88
##                     DIRECCION_ENC        CBML
## 209421 CR  044 A   088  000 00000 03020770009
## 209422 CR  065   091 C  000 00000        0513
## 209423   CL  044   071  000 00000        1117
## 209424   CR  063   050  000 00000        1101
## 209425   CL  027   046  000 00000        1401
## 209426   CR  044   088  000 00000        0302
##                             TIPO_GEOCOD   GRAVEDAD                BARRIO
## 209421        Nomenclatura con Interior     HERIDO           Las Granjas
## 209422 Malla vial aproximada: CR 65-91A     HERIDO Francisco Antonio Zea
## 209423                       Malla vial SOLO DAÑOS         Florida Nueva
## 209424                       Malla vial     HERIDO    Carlos E. Restrepo
## 209425                       Malla vial SOLO DAÑOS       Barrio Colombia
## 209426                       Malla vial     HERIDO           Las Granjas
##                  COMUNA        DISENO DIA_NOMBRE MES
## 209421         Manrique  Interseccion  MIÉRCOLES  12
## 209422         Castilla  Interseccion  MIÉRCOLES  12
## 209423 Laureles Estadio  Tramo de via  MIÉRCOLES  12
## 209424 Laureles Estadio  Tramo de via  MIÉRCOLES  12
## 209425       El Poblado Lote o Predio  MIÉRCOLES  12
## 209426         Manrique  Interseccion  MIÉRCOLES  12

Se convierten las variables dia y mes a factor

Total_Dataset$DIA <- as.factor(Total_Dataset$DIA)
Total_Dataset$MES <- as.factor(Total_Dataset$MES)
tail(Total_Dataset)
##        X.U.FEFF.OBJECTID        X       Y RADICADO      FECHA
## 209421            686550 836789.8 1186516  1652251 2018-12-05
## 209422            686551 834601.3 1186742  1652236 2018-12-05
## 209423            686552 832577.7 1183055  1652193 2018-12-05
## 209424            686553 833974.2 1183603  1652324 2018-12-05
## 209425            686554 834299.9 1180877  1652349 2018-12-05
## 209426            686555 836838.6 1186467  1652347 2018-12-05
##                  HORA DIA PERIODO  CLASE     DIRECCION
## 209421 03:20:00 p. m.   5    2018 Choque CR 44 A CL 88
## 209422 03:20:00 p. m.   5    2018 Choque CR 65 CL 91 C
## 209423 03:20:00 p. m.   5    2018 Choque   CL 44 CR 71
## 209424 03:30:00 p. m.   5    2018 Choque   CR 63 CL 50
## 209425 03:40:00 p. m.   5    2018 Choque   CL 27 CR 46
## 209426 04:10:00 p. m.   5    2018   Otro   CR 44 CL 88
##                     DIRECCION_ENC        CBML
## 209421 CR  044 A   088  000 00000 03020770009
## 209422 CR  065   091 C  000 00000        0513
## 209423   CL  044   071  000 00000        1117
## 209424   CR  063   050  000 00000        1101
## 209425   CL  027   046  000 00000        1401
## 209426   CR  044   088  000 00000        0302
##                             TIPO_GEOCOD   GRAVEDAD                BARRIO
## 209421        Nomenclatura con Interior     HERIDO           Las Granjas
## 209422 Malla vial aproximada: CR 65-91A     HERIDO Francisco Antonio Zea
## 209423                       Malla vial SOLO DAÑOS         Florida Nueva
## 209424                       Malla vial     HERIDO    Carlos E. Restrepo
## 209425                       Malla vial SOLO DAÑOS       Barrio Colombia
## 209426                       Malla vial     HERIDO           Las Granjas
##                  COMUNA        DISENO DIA_NOMBRE MES
## 209421         Manrique  Interseccion  MIÉRCOLES  12
## 209422         Castilla  Interseccion  MIÉRCOLES  12
## 209423 Laureles Estadio  Tramo de via  MIÉRCOLES  12
## 209424 Laureles Estadio  Tramo de via  MIÉRCOLES  12
## 209425       El Poblado Lote o Predio  MIÉRCOLES  12
## 209426         Manrique  Interseccion  MIÉRCOLES  12

Se genera frecuencia que es la tabla a analizar

library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
Total_Dataset_Freq <- sqldf("SELECT FECHA, CLASE, count(CLASE) AS FREQ, DIA_NOMBRE, MES, DIA 
       FROM Total_Dataset
       GROUP BY FECHA, CLASE")

se agrega la variable ano al datafrate

Total_Dataset_Freq$ANO <- as.factor(format(Total_Dataset_Freq$FECHA,'%Y'))

Se agrega la variable semana al dataframe

Total_Dataset_Freq$SEMANA <-as.factor(format(Total_Dataset_Freq$FECHA,'%V'))
tail(Total_Dataset_Freq)
##           FECHA          CLASE FREQ DIA_NOMBRE MES DIA  ANO SEMANA
## 9023 2018-12-30    Volcamiento    2  DOMINGO    12  30 2018     52
## 9024 2018-12-31      Atropello    8  LUNES      12  31 2018     01
## 9025 2018-12-31 Caida_Ocupante    6  LUNES      12  31 2018     01
## 9026 2018-12-31         Choque   50  LUNES      12  31 2018     01
## 9027 2018-12-31           Otro   10  LUNES      12  31 2018     01
## 9028 2018-12-31    Volcamiento    2  LUNES      12  31 2018     01

Se depura la variable DIA_NOMBRE para eliminar los espacios, quitar tildes, dar un orden al factor

levels(Total_Dataset_Freq$DIA_NOMBRE)
## [1] "DOMINGO  " "JUEVES   " "LUNES    " "MARTES   " "MIÉRCOLES" "SÁBADO   "
## [7] "VIERNES  "
levels(Total_Dataset_Freq$DIA_NOMBRE) <- c("DOMINGO","JUEVES","LUNES","MARTES","MIERCOLES","SABADO","VIERNES") 
levels(Total_Dataset_Freq$DIA_NOMBRE)
## [1] "DOMINGO"   "JUEVES"    "LUNES"     "MARTES"    "MIERCOLES" "SABADO"   
## [7] "VIERNES"
Total_Dataset_Freq$DIA_NOMBRE <- ordered(Total_Dataset_Freq$DIA_NOMBRE,c("LUNES", "MARTES", "MIERCOLES", "JUEVES", "VIERNES", "SABADO", "DOMINGO"))
levels(Total_Dataset_Freq$DIA_NOMBRE)
## [1] "LUNES"     "MARTES"    "MIERCOLES" "JUEVES"    "VIERNES"   "SABADO"   
## [7] "DOMINGO"

Ahora se agregan otras variables a que caracterizan fechas especiales. Se carga un CSV y se realiza un left join a la tabla Total_Dataset_Freq

Dias_Especiales <- read.csv(file="./data/Caracterizacion.csv", encoding="UTF-8", header=TRUE, sep=";",stringsAsFactors=FALSE)
head(Dias_Especiales)
##      Fecha Lunes martes miercoles jueves viernes sabado domingo Enero
## 1 1-1-2014     0      0         1      0       0      0       0     1
## 2 2-1-2014     0      0         0      1       0      0       0     1
## 3 3-1-2014     0      0         0      0       1      0       0     1
## 4 4-1-2014     0      0         0      0       0      1       0     1
## 5 5-1-2014     0      0         0      0       0      0       1     1
## 6 6-1-2014     1      0         0      0       0      0       0     1
##   Febrero Marzo Abril Mayo Junio Julio Agosto Septiembre Octubre Noviembre
## 1       0     0     0    0     0     0      0          0       0         0
## 2       0     0     0    0     0     0      0          0       0         0
## 3       0     0     0    0     0     0      0          0       0         0
## 4       0     0     0    0     0     0      0          0       0         0
## 5       0     0     0    0     0     0      0          0       0         0
## 6       0     0     0    0     0     0      0          0       0         0
##   Diciembre Feriado Semana.Santa Prima Mujer Padre Madre AmoryAmistad
## 1         0       1            0     0     0     0     0            0
## 2         0       0            0     0     0     0     0            0
## 3         0       0            0     0     0     0     0            0
## 4         0       0            0     0     0     0     0            0
## 5         0       0            0     0     0     0     0            0
## 6         0       1            0     0     0     0     0            0
##   Viernes_Antes_Puente Quincena Viernes_Desp_Qincena
## 1                    0        0                    0
## 2                    0        0                    0
## 3                    1        0                    0
## 4                    0        0                    0
## 5                    0        0                    0
## 6                    0        0                    0

Se transforma la variable FEcha a formato Fecha pues se cargo como factor y se unifica el nombre a FECHA para que coincida con l clave de la tabla Total_Dataset_Freq

Dias_Especiales <- Dias_Especiales[1:1826,]
Dias_Especiales$FECHA <- as.Date(Dias_Especiales$Fecha, format="%d-%m-%Y")
Dias_Especiales$Fecha <- NULL

Estos dias especiales se incorporan al dataframe Total_Dataset_Freq

Total_Dataset_Freq <- sqldf("SELECT * 
              FROM Total_Dataset_Freq
              LEFT JOIN Dias_Especiales USING(FECHA)")
head(Total_Dataset_Freq)
##        FECHA          CLASE FREQ DIA_NOMBRE MES DIA  ANO SEMANA Lunes
## 1 2014-01-01      Atropello   13  MIERCOLES   1   1 2014     01     0
## 2 2014-01-01 Caida_Ocupante    7  MIERCOLES   1   1 2014     01     0
## 3 2014-01-01         Choque   35  MIERCOLES   1   1 2014     01     0
## 4 2014-01-01           Otro   18  MIERCOLES   1   1 2014     01     0
## 5 2014-01-01    Volcamiento    1  MIERCOLES   1   1 2014     01     0
## 6 2014-01-02      Atropello   12     JUEVES   1   2 2014     01     0
##   martes miercoles jueves viernes sabado domingo Enero Febrero Marzo Abril
## 1      0         1      0       0      0       0     1       0     0     0
## 2      0         1      0       0      0       0     1       0     0     0
## 3      0         1      0       0      0       0     1       0     0     0
## 4      0         1      0       0      0       0     1       0     0     0
## 5      0         1      0       0      0       0     1       0     0     0
## 6      0         0      1       0      0       0     1       0     0     0
##   Mayo Junio Julio Agosto Septiembre Octubre Noviembre Diciembre Feriado
## 1    0     0     0      0          0       0         0         0       1
## 2    0     0     0      0          0       0         0         0       1
## 3    0     0     0      0          0       0         0         0       1
## 4    0     0     0      0          0       0         0         0       1
## 5    0     0     0      0          0       0         0         0       1
## 6    0     0     0      0          0       0         0         0       0
##   Semana.Santa Prima Mujer Padre Madre AmoryAmistad Viernes_Antes_Puente
## 1            0     0     0     0     0            0                    0
## 2            0     0     0     0     0            0                    0
## 3            0     0     0     0     0            0                    0
## 4            0     0     0     0     0            0                    0
## 5            0     0     0     0     0            0                    0
## 6            0     0     0     0     0            0                    0
##   Quincena Viernes_Desp_Qincena
## 1        0                    0
## 2        0                    0
## 3        0                    0
## 4        0                    0
## 5        0                    0
## 6        0                    0

ANALISIS DESCRIPTIVO

Para el Total de Accidentes

library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(data=Total_Dataset_Freq,
        x = ~FECHA,
        y = ~FREQ,
        type = "scatter", mode = "lines",
        split = ~ANO,
         ine=list(width=1))%>%
  layout(title='Accidentes_Medellin',
         xaxis=list(title="Dia"),
         yaxis=list(title="Unidades"))
## Warning: 'scatter' objects don't have these attributes: 'ine'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'selectedpoints', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'stackgroup', 'orientation', 'groupnorm', 'stackgaps', 'text', 'hovertext', 'mode', 'hoveron', 'hovertemplate', 'line', 'connectgaps', 'cliponaxis', 'fill', 'fillcolor', 'marker', 'selected', 'unselected', 'textposition', 'textfont', 'r', 't', 'error_x', 'error_y', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'textpositionsrc', 'rsrc', 'tsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

## Warning: 'scatter' objects don't have these attributes: 'ine'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'selectedpoints', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'stackgroup', 'orientation', 'groupnorm', 'stackgaps', 'text', 'hovertext', 'mode', 'hoveron', 'hovertemplate', 'line', 'connectgaps', 'cliponaxis', 'fill', 'fillcolor', 'marker', 'selected', 'unselected', 'textposition', 'textfont', 'r', 't', 'error_x', 'error_y', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'textpositionsrc', 'rsrc', 'tsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

## Warning: 'scatter' objects don't have these attributes: 'ine'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'selectedpoints', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'stackgroup', 'orientation', 'groupnorm', 'stackgaps', 'text', 'hovertext', 'mode', 'hoveron', 'hovertemplate', 'line', 'connectgaps', 'cliponaxis', 'fill', 'fillcolor', 'marker', 'selected', 'unselected', 'textposition', 'textfont', 'r', 't', 'error_x', 'error_y', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'textpositionsrc', 'rsrc', 'tsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

## Warning: 'scatter' objects don't have these attributes: 'ine'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'selectedpoints', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'stackgroup', 'orientation', 'groupnorm', 'stackgaps', 'text', 'hovertext', 'mode', 'hoveron', 'hovertemplate', 'line', 'connectgaps', 'cliponaxis', 'fill', 'fillcolor', 'marker', 'selected', 'unselected', 'textposition', 'textfont', 'r', 't', 'error_x', 'error_y', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'textpositionsrc', 'rsrc', 'tsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'

## Warning: 'scatter' objects don't have these attributes: 'ine'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'selectedpoints', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'stackgroup', 'orientation', 'groupnorm', 'stackgaps', 'text', 'hovertext', 'mode', 'hoveron', 'hovertemplate', 'line', 'connectgaps', 'cliponaxis', 'fill', 'fillcolor', 'marker', 'selected', 'unselected', 'textposition', 'textfont', 'r', 't', 'error_x', 'error_y', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'textpositionsrc', 'rsrc', 'tsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
plot_ly(data=Total_Dataset_Freq,
        x = ~ANO,
        y = ~FREQ,
        type = "box")%>%
  layout(title='Accidentes_Medellin',
         xaxis=list(title="ano"),
         yaxis=list(title="Unidades"))
plot_ly(data=Total_Dataset_Freq,
        x = ~MES,
        y = ~FREQ,
        type = "box")%>%
  layout(title='Accidentes_Medellin',
         xaxis=list(title="Mes"),
         yaxis=list(title="Unidades"))
plot_ly(data=Total_Dataset_Freq,
        x = ~DIA_NOMBRE,
        y = ~FREQ,
        type = "box")%>%
  layout(title='Accidentes_Medellin',
         xaxis=list(title="Dia_Nombre"),
         yaxis=list(title="Unidades"))

Se observa que excepto el domingo que es menor, los demasla accidentalidad promedio es muy similar, pero se diferencia en la cola derecha de la distribucion

plot_ly(data=Total_Dataset_Freq,
        x = ~SEMANA,
        y = ~FREQ,
        type = "box")%>%
  layout(title='Accidentes_Medellin',
         xaxis=list(title="SEMANA"),
         yaxis=list(title="Unidades"))
attach(Total_Dataset_Freq)
aggregate(FREQ~ANO*SEMANA, data=Total_Dataset_Freq,FUN=mean)%>%
  plot_ly(x = ~SEMANA,
         y = ~FREQ,
         type = "scatter" ,mode = "lines",
         split = ~ANO,
         line=list(width=1))%>%
  layout(title='Promedio diario mensual de ACCIDENTES en Medellin',
         xaxis=list(title="SEMANA"),
         yaxis=list(title="No. Accidentes"))
aggregate(FREQ~ANO*MES, data=Total_Dataset_Freq,FUN=mean)%>%
  plot_ly(x = ~MES,
         y = ~FREQ,
         type = "scatter" ,mode = "lines",
         split = ~ANO,
         line=list(width=1))%>%
  layout(title='Promedio diario mensual de ACCIDENTES en Medellin',
         xaxis=list(title="Mes"),
         yaxis=list(title="No. Accidentes"))
aggregate(FREQ~CLASE*MES, data=Total_Dataset_Freq,FUN=mean)%>%
  plot_ly(x = ~MES,
         y = ~FREQ,
         type = "scatter" ,mode = "lines",
         split = ~CLASE,
         line=list(width=1))%>%
  layout(title='Promedio diario mensual de ACCIDENTES en Medellin por CLASE',
         xaxis=list(title="Mes"),
         yaxis=list(title="No. Accidentes"))
aggregate(FREQ~CLASE*SEMANA, data=Total_Dataset_Freq,FUN=mean)%>%
  plot_ly(x = ~SEMANA,
         y = ~FREQ,
         type = "scatter" ,mode = "lines",
         split = ~CLASE,
         line=list(width=1))%>%
  layout(title='Promedio diario SEMANAL de ACCIDENTES en Medellin por CLASE',
         xaxis=list(title="Mes"),
         yaxis=list(title="No. Accidentes"))
aggregate(FREQ~ANO*DIA_NOMBRE, data=Total_Dataset_Freq,FUN=mean)%>%
  plot_ly(x = ~DIA_NOMBRE,
         y = ~FREQ,
         type = "scatter" ,mode = "lines",
         split = ~ANO,
         line=list(width=1))%>%
  layout(title='Promedio diario DIA DE LA SEMANA de Accidentes de Transito en Medellin',
         xaxis=list(title="Dia Semana"),
         yaxis=list(title="No. Accidentes"))
aggregate(FREQ~ANO*CLASE, data=Total_Dataset_Freq,FUN=mean)%>%
  plot_ly(x = ~CLASE,
         y = ~FREQ,
         type = "scatter" ,mode = "lines",
         split = ~ANO,
         line=list(width=1))%>%
  layout(title='Promedio diario CLASE de Accidentes de Transito en Medellin',
         xaxis=list(title="Clase"),
         yaxis=list(title="No. Accidentes"))

LAs predicciones a realizar seran para cada uno de las 5 clases de accidentes, y para los periodos de tiempo mensual, semanal y diario, por lo que se generaran 15 modelos, 5 clases de accidente por 3 tipos de periodo de prediccion.

Se realiza entonces la particion de los datos por frecuencia: M: Mensual, S: Semanal, D: Diario

primero se realiza un reshape del dataframe para consolidar los casos por dia y separarlos factores CLASE en columnas

library(reshape)
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:plotly':
## 
##     rename
Total_Dataset_Freq <- cast(Total_Dataset_Freq[,c(1,2,3)],FECHA~CLASE)
## Using FREQ as value column.  Use the value argument to cast to override this choice
Total_Dataset_Freq <- sqldf("SELECT * 
              FROM Total_Dataset_Freq
              LEFT JOIN Dias_Especiales USING(FECHA)")

se agrega la variable ano al datafrate

Total_Dataset_Freq$ANO <- as.factor(format(Total_Dataset_Freq$FECHA,'%Y'))

Se agrega la variable semana al dataframe

Total_Dataset_Freq$SEMANA <-as.factor(format(Total_Dataset_Freq$FECHA,'%V'))

SE agrega la variable dia al dataframe

Total_Dataset_Freq$DIA <-as.factor(format(Total_Dataset_Freq$FECHA,'%d'))

Se agrega el dia de la semana

Total_Dataset_Freq$DIA_SEMANA <-as.factor(weekdays(Total_Dataset_Freq$FECHA))

Se agrega el mes

Total_Dataset_Freq$MES <-as.factor(format(Total_Dataset_Freq$FECHA,'%m'))
head(Total_Dataset_Freq)
##        FECHA Atropello Caida_Ocupante Choque Otro Volcamiento Lunes martes
## 1 2014-01-01        13              7     35   18           1     0      0
## 2 2014-01-02        12              7     43    9           1     0      0
## 3 2014-01-03         7              5     67   13           1     0      0
## 4 2014-01-04        11              7     40    9           1     0      0
## 5 2014-01-05         6              5     43   10           3     0      0
## 6 2014-01-06         4              5     23    9           2     1      0
##   miercoles jueves viernes sabado domingo Enero Febrero Marzo Abril Mayo
## 1         1      0       0      0       0     1       0     0     0    0
## 2         0      1       0      0       0     1       0     0     0    0
## 3         0      0       1      0       0     1       0     0     0    0
## 4         0      0       0      1       0     1       0     0     0    0
## 5         0      0       0      0       1     1       0     0     0    0
## 6         0      0       0      0       0     1       0     0     0    0
##   Junio Julio Agosto Septiembre Octubre Noviembre Diciembre Feriado
## 1     0     0      0          0       0         0         0       1
## 2     0     0      0          0       0         0         0       0
## 3     0     0      0          0       0         0         0       0
## 4     0     0      0          0       0         0         0       0
## 5     0     0      0          0       0         0         0       0
## 6     0     0      0          0       0         0         0       1
##   Semana.Santa Prima Mujer Padre Madre AmoryAmistad Viernes_Antes_Puente
## 1            0     0     0     0     0            0                    0
## 2            0     0     0     0     0            0                    0
## 3            0     0     0     0     0            0                    1
## 4            0     0     0     0     0            0                    0
## 5            0     0     0     0     0            0                    0
## 6            0     0     0     0     0            0                    0
##   Quincena Viernes_Desp_Qincena  ANO SEMANA DIA DIA_SEMANA MES
## 1        0                    0 2014     01  01  miércoles  01
## 2        0                    0 2014     01  02     jueves  01
## 3        0                    0 2014     01  03    viernes  01
## 4        0                    0 2014     01  04     sábado  01
## 5        0                    0 2014     01  05    domingo  01
## 6        0                    0 2014     02  06      lunes  01

Se reemplazan los NA por ceros

Total_Dataset_Freq[is.na(Total_Dataset_Freq)] <- 0

Se agrega la columna Total_Accidentes

Total_Dataset_Freq$Total_Accidentes <- Total_Dataset_Freq$Atropello + Total_Dataset_Freq$Caida_Ocupante + Total_Dataset_Freq$Choque + Total_Dataset_Freq$Otro + Total_Dataset_Freq$Volcamiento

Se agrega la columna Accidentes Graves

Total_Dataset_Freq$Accidentes_Graves <- Total_Dataset_Freq$Atropello + Total_Dataset_Freq$Caida_Ocupante + Total_Dataset_Freq$Volcamiento
tail(Total_Dataset_Freq)
##           FECHA Atropello Caida_Ocupante Choque Otro Volcamiento Lunes
## 1821 2018-12-26        11              7     84    6           4     0
## 1822 2018-12-27        13              3     77    9           2     0
## 1823 2018-12-28         6              6     84    7           1     0
## 1824 2018-12-29        15              4     59    7           3     0
## 1825 2018-12-30         5              7     33   10           2     0
## 1826 2018-12-31         8              6     50   10           2     1
##      martes miercoles jueves viernes sabado domingo Enero Febrero Marzo
## 1821      0         1      0       0      0       0     0       0     0
## 1822      0         0      1       0      0       0     0       0     0
## 1823      0         0      0       1      0       0     0       0     0
## 1824      0         0      0       0      1       0     0       0     0
## 1825      0         0      0       0      0       1     0       0     0
## 1826      0         0      0       0      0       0     0       0     0
##      Abril Mayo Junio Julio Agosto Septiembre Octubre Noviembre Diciembre
## 1821     0    0     0     0      0          0       0         0         1
## 1822     0    0     0     0      0          0       0         0         1
## 1823     0    0     0     0      0          0       0         0         1
## 1824     0    0     0     0      0          0       0         0         1
## 1825     0    0     0     0      0          0       0         0         1
## 1826     0    0     0     0      0          0       0         0         1
##      Feriado Semana.Santa Prima Mujer Padre Madre AmoryAmistad
## 1821       0            0     0     0     0     0            0
## 1822       0            0     0     0     0     0            0
## 1823       0            0     0     0     0     0            0
## 1824       0            0     0     0     0     0            0
## 1825       0            0     0     0     0     0            0
## 1826       0            0     0     0     0     0            0
##      Viernes_Antes_Puente Quincena Viernes_Desp_Qincena  ANO SEMANA DIA
## 1821                    0        0                    0 2018     52  26
## 1822                    0        0                    0 2018     52  27
## 1823                    0        0                    0 2018     52  28
## 1824                    0        0                    0 2018     52  29
## 1825                    0        0                    0 2018     52  30
## 1826                    0        1                    0 2018     01  31
##      DIA_SEMANA MES Total_Accidentes Accidentes_Graves
## 1821  miércoles  12              112                22
## 1822     jueves  12              104                18
## 1823    viernes  12              104                13
## 1824     sábado  12               88                22
## 1825    domingo  12               57                14
## 1826      lunes  12               76                16

EVALUACION DE MODELOS

1. Particion de los datos para entrenamiento y test

Train_D_Dataset <- subset(Total_Dataset_Freq, ANO!="2018")
summary(Train_D_Dataset$ANO)
## 2014 2015 2016 2017 2018 
##  365  365  366  365    0

Se ajustan otra vez los niveles del factor ANO

Train_D_Dataset$ANO <- factor(Train_D_Dataset$ANO)
summary(Train_D_Dataset$ANO)
## 2014 2015 2016 2017 
##  365  365  366  365
library(sqldf)
Test_D_Dataset <- sqldf("SELECT *  
       FROM Total_Dataset_Freq
       WHERE ANO == 2018")
summary(Test_D_Dataset$ANO)
## 2014 2015 2016 2017 2018 
##    0    0    0    0  365

Se ajustan otra vez los niveles del factor ANO

Test_D_Dataset$ANO <- factor(Test_D_Dataset$ANO)
summary(Test_D_Dataset$ANO)
## 2018 
##  365

2. KNN

Total Accidentes

set.seed(123) # fija la semilla del generador de números para que sea reproducible
head(Train_D_Dataset)
##        FECHA Atropello Caida_Ocupante Choque Otro Volcamiento Lunes martes
## 1 2014-01-01        13              7     35   18           1     0      0
## 2 2014-01-02        12              7     43    9           1     0      0
## 3 2014-01-03         7              5     67   13           1     0      0
## 4 2014-01-04        11              7     40    9           1     0      0
## 5 2014-01-05         6              5     43   10           3     0      0
## 6 2014-01-06         4              5     23    9           2     1      0
##   miercoles jueves viernes sabado domingo Enero Febrero Marzo Abril Mayo
## 1         1      0       0      0       0     1       0     0     0    0
## 2         0      1       0      0       0     1       0     0     0    0
## 3         0      0       1      0       0     1       0     0     0    0
## 4         0      0       0      1       0     1       0     0     0    0
## 5         0      0       0      0       1     1       0     0     0    0
## 6         0      0       0      0       0     1       0     0     0    0
##   Junio Julio Agosto Septiembre Octubre Noviembre Diciembre Feriado
## 1     0     0      0          0       0         0         0       1
## 2     0     0      0          0       0         0         0       0
## 3     0     0      0          0       0         0         0       0
## 4     0     0      0          0       0         0         0       0
## 5     0     0      0          0       0         0         0       0
## 6     0     0      0          0       0         0         0       1
##   Semana.Santa Prima Mujer Padre Madre AmoryAmistad Viernes_Antes_Puente
## 1            0     0     0     0     0            0                    0
## 2            0     0     0     0     0            0                    0
## 3            0     0     0     0     0            0                    1
## 4            0     0     0     0     0            0                    0
## 5            0     0     0     0     0            0                    0
## 6            0     0     0     0     0            0                    0
##   Quincena Viernes_Desp_Qincena  ANO SEMANA DIA DIA_SEMANA MES
## 1        0                    0 2014     01  01  miércoles  01
## 2        0                    0 2014     01  02     jueves  01
## 3        0                    0 2014     01  03    viernes  01
## 4        0                    0 2014     01  04     sábado  01
## 5        0                    0 2014     01  05    domingo  01
## 6        0                    0 2014     02  06      lunes  01
##   Total_Accidentes Accidentes_Graves
## 1               74                21
## 2               72                20
## 3               93                13
## 4               68                19
## 5               67                14
## 6               43                11

#DIA_NOMBRE+DIA+SEMANA+MES+ANO+AmoryAmistad+Madre+Padre+Mujer+Prima+Semana.Santa+Feriado

library(caret)
## Loading required package: lattice
trcntrl = trainControl(method="cv", number=10)
caret_knn_fit = caret::train(Total_Accidentes~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "knn", trControl = trcntrl,
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
summary(caret_knn_fit)
##             Length Class      Mode     
## learn        2     -none-     list     
## k            1     -none-     numeric  
## theDots      0     -none-     list     
## xNames      63     -none-     character
## problemType  1     -none-     character
## tuneValue    1     data.frame list     
## obsLevels    1     -none-     logical  
## param        0     -none-     list
caret_knn_fit
## k-Nearest Neighbors 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1313, 1315, 1316, 1316, 1315, 1315, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared    MAE     
##    5  30.54405  0.04852764  26.07710
##    7  27.82936  0.05057528  23.31746
##    9  26.25975  0.06452191  21.62607
##   11  25.54304  0.07395262  20.70324
##   13  25.25037  0.07830750  20.13094
##   15  25.13014  0.07935002  19.86330
##   17  25.13203  0.07863251  19.73166
##   19  25.08799  0.08092453  19.64728
##   21  25.16812  0.07471798  19.70308
##   23  25.05694  0.08113243  19.88569
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 23.

Calculo MSE y RMSE para los datos de entrenamiento

y_tr_pred_knn<-predict(caret_knn_fit,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_knn<-mean((Train_D_Dataset$Total_Accidentes-y_tr_pred_knn)^2) # calcula el mse de entrenamiento
RMSE_tr_knn = sqrt(mse_tr_knn)
mse_tr_knn
## [1] 576.2214
RMSE_tr_knn
## [1] 24.00461

Calculo MSE y RMSE para los datos de prueba (2018)

y_test_pred_knn<-predict(caret_knn_fit,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_knn<-mean((Test_D_Dataset$Total_Accidentes-y_test_pred_knn)^2) # calcula el mse de entrenamiento
RMSE_test_knn = sqrt(mse_test_knn)
mse_test_knn
## [1] 629.9186
RMSE_test_knn
## [1] 25.09818
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_knn,
            name='Modelo knn',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Choques

trcntrl = trainControl(method="cv", number=10)
caret_knn_fit_Ch = caret::train(Choque~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "knn", trControl = trcntrl,
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
summary(caret_knn_fit_Ch)
##             Length Class      Mode     
## learn        2     -none-     list     
## k            1     -none-     numeric  
## theDots      0     -none-     list     
## xNames      63     -none-     character
## problemType  1     -none-     character
## tuneValue    1     data.frame list     
## obsLevels    1     -none-     logical  
## param        0     -none-     list
caret_knn_fit_Ch
## k-Nearest Neighbors 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1315, 1314, 1317, 1314, 1315, 1314, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared    MAE     
##    5  24.74517  0.04171384  21.39042
##    7  22.53987  0.04289098  19.05299
##    9  21.28711  0.05153353  17.68713
##   11  20.58471  0.06268419  16.70616
##   13  20.29705  0.06945246  16.32945
##   15  20.15232  0.07633053  16.04545
##   17  20.14957  0.07621763  15.94365
##   19  20.09285  0.08181138  15.86564
##   21  20.04423  0.08383903  15.85666
##   23  20.14198  0.07589901  16.05049
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
y_tr_pred_knn_ch<-predict(caret_knn_fit_Ch,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_knn_ch<-mean((Train_D_Dataset$Choque-y_tr_pred_knn_ch)^2) # calcula el mse de entrenamiento
RMSE_tr_knn_ch = sqrt(mse_tr_knn_ch)
mse_tr_knn_ch
## [1] 370.3256
RMSE_tr_knn_ch
## [1] 19.24385
y_test_pred_knn_ch<-predict(caret_knn_fit_Ch,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_knn_ch<-mean((Test_D_Dataset$Choque-y_test_pred_knn_ch)^2) # calcula el mse de entrenamiento
RMSE_test_knn_ch = sqrt(mse_test_knn_ch)
mse_test_knn_ch
## [1] 380.936
RMSE_test_knn_ch
## [1] 19.51758
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Choque,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_knn_ch,
            name='Modelo knn',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Choques',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Accidentes Graves

trcntrl = trainControl(method="cv", number=10)
caret_knn_fit_gr = caret::train(Accidentes_Graves~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "knn", trControl = trcntrl,
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
summary(caret_knn_fit_gr)
##             Length Class      Mode     
## learn        2     -none-     list     
## k            1     -none-     numeric  
## theDots      0     -none-     list     
## xNames      63     -none-     character
## problemType  1     -none-     character
## tuneValue    1     data.frame list     
## obsLevels    1     -none-     logical  
## param        0     -none-     list
caret_knn_fit_gr
## k-Nearest Neighbors 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1316, 1315, 1315, 1315, 1314, 1315, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared    MAE     
##    5  7.170948  0.02435963  5.594836
##    7  6.876460  0.03325676  5.374907
##    9  6.773950  0.03372694  5.317876
##   11  6.706650  0.03861691  5.264893
##   13  6.661325  0.04345227  5.255031
##   15  6.624616  0.05005969  5.207789
##   17  6.642902  0.04657338  5.224121
##   19  6.638075  0.04545474  5.210832
##   21  6.643289  0.04463043  5.202793
##   23  6.640412  0.04467729  5.187826
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 15.
y_tr_pred_knn_gr<-predict(caret_knn_fit_gr,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_knn_gr<-mean((Train_D_Dataset$Accidentes_Graves-y_tr_pred_knn_gr)^2) # calcula el mse de entrenamiento
RMSE_tr_knn_gr = sqrt(mse_tr_knn_gr)
mse_tr_knn_gr
## [1] 40.21077
RMSE_tr_knn_gr
## [1] 6.341196
y_test_pred_knn_gr<-predict(caret_knn_fit_gr,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_knn_gr<-mean((Test_D_Dataset$Accidentes_Graves-y_test_pred_knn_gr)^2) # calcula el mse de entrenamiento
RMSE_test_knn_gr = sqrt(mse_test_knn_gr)
mse_test_knn_gr
## [1] 45.44197
RMSE_test_knn_gr
## [1] 6.741066
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Accidentes_Graves,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_knn_gr,
            name='Modelo knn',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Accidentes Graves',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Otros

trcntrl = trainControl(method="cv", number=10)
caret_knn_fit_ot = caret::train(Otro~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "knn", trControl = trcntrl,
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
summary(caret_knn_fit_ot)
##             Length Class      Mode     
## learn        2     -none-     list     
## k            1     -none-     numeric  
## theDots      0     -none-     list     
## xNames      63     -none-     character
## problemType  1     -none-     character
## tuneValue    1     data.frame list     
## obsLevels    1     -none-     logical  
## param        0     -none-     list
caret_knn_fit_ot
## k-Nearest Neighbors 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1315, 1315, 1315, 1316, 1315, 1314, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared    MAE     
##    5  5.208633  0.01213720  3.976676
##    7  5.075883  0.01690476  3.880784
##    9  5.039922  0.01559127  3.874442
##   11  4.997860  0.01704169  3.851977
##   13  4.970320  0.01533633  3.854434
##   15  4.966180  0.01488424  3.851711
##   17  4.955894  0.01507451  3.849701
##   19  4.942486  0.01743955  3.838876
##   21  4.927215  0.01903057  3.831270
##   23  4.914770  0.01931428  3.813755
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 23.
y_tr_pred_knn_ot<-predict(caret_knn_fit_ot,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_knn_ot<-mean((Train_D_Dataset$Otro-y_tr_pred_knn_ot)^2) # calcula el mse de entrenamiento
RMSE_tr_knn_ot = sqrt(mse_tr_knn_ot)
mse_tr_knn_ot
## [1] 22.38182
RMSE_tr_knn_ot
## [1] 4.730943
y_test_pred_knn_ot<-predict(caret_knn_fit_ot,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_knn_ot<-mean((Test_D_Dataset$Otro-y_test_pred_knn_ot)^2) # calcula el mse de entrenamiento
RMSE_test_knn_ot = sqrt(mse_test_knn_ot)
mse_test_knn_ot
## [1] 24.70889
RMSE_test_knn_ot
## [1] 4.970804
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Otro,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_knn_ot,
            name='Modelo knn',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Otros Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Resumen Modelos KNN para los diferentes tipos de accidente

  • TOTAL ACCIDENTES: MSE = 629.9185512 - RMSE = 629.9185512

2. REGRESION LINEAL

Total Accidentes

head(Train_D_Dataset)
##        FECHA Atropello Caida_Ocupante Choque Otro Volcamiento Lunes martes
## 1 2014-01-01        13              7     35   18           1     0      0
## 2 2014-01-02        12              7     43    9           1     0      0
## 3 2014-01-03         7              5     67   13           1     0      0
## 4 2014-01-04        11              7     40    9           1     0      0
## 5 2014-01-05         6              5     43   10           3     0      0
## 6 2014-01-06         4              5     23    9           2     1      0
##   miercoles jueves viernes sabado domingo Enero Febrero Marzo Abril Mayo
## 1         1      0       0      0       0     1       0     0     0    0
## 2         0      1       0      0       0     1       0     0     0    0
## 3         0      0       1      0       0     1       0     0     0    0
## 4         0      0       0      1       0     1       0     0     0    0
## 5         0      0       0      0       1     1       0     0     0    0
## 6         0      0       0      0       0     1       0     0     0    0
##   Junio Julio Agosto Septiembre Octubre Noviembre Diciembre Feriado
## 1     0     0      0          0       0         0         0       1
## 2     0     0      0          0       0         0         0       0
## 3     0     0      0          0       0         0         0       0
## 4     0     0      0          0       0         0         0       0
## 5     0     0      0          0       0         0         0       0
## 6     0     0      0          0       0         0         0       1
##   Semana.Santa Prima Mujer Padre Madre AmoryAmistad Viernes_Antes_Puente
## 1            0     0     0     0     0            0                    0
## 2            0     0     0     0     0            0                    0
## 3            0     0     0     0     0            0                    1
## 4            0     0     0     0     0            0                    0
## 5            0     0     0     0     0            0                    0
## 6            0     0     0     0     0            0                    0
##   Quincena Viernes_Desp_Qincena  ANO SEMANA DIA DIA_SEMANA MES
## 1        0                    0 2014     01  01  miércoles  01
## 2        0                    0 2014     01  02     jueves  01
## 3        0                    0 2014     01  03    viernes  01
## 4        0                    0 2014     01  04     sábado  01
## 5        0                    0 2014     01  05    domingo  01
## 6        0                    0 2014     02  06      lunes  01
##   Total_Accidentes Accidentes_Graves
## 1               74                21
## 2               72                20
## 3               93                13
## 4               68                19
## 5               67                14
## 6               43                11
trcntrl = trainControl(method="cv", number=10)
caret_lm_fit = caret::train(Total_Accidentes~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "lm", trControl = trcntrl,
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
summary(caret_lm_fit)
## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.162 -10.047  -0.588   8.959  69.282 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          115.7276     0.4078 283.784  < 2e-16 ***
## SEMANA02               1.2376     0.5835   2.121  0.03410 *  
## SEMANA03               3.4399     0.5844   5.886 4.94e-09 ***
## SEMANA04               3.5403     0.5849   6.053 1.82e-09 ***
## SEMANA05               4.1744     0.5846   7.140 1.49e-12 ***
## SEMANA06               4.8285     0.5846   8.260 3.35e-16 ***
## SEMANA07               5.1933     0.5848   8.880  < 2e-16 ***
## SEMANA08               4.7466     0.5844   8.122 9.99e-16 ***
## SEMANA09               4.6656     0.5849   7.977 3.11e-15 ***
## SEMANA10               5.6857     0.5846   9.726  < 2e-16 ***
## SEMANA11               5.5909     0.5842   9.571  < 2e-16 ***
## SEMANA12               4.4708     0.5919   7.554 7.60e-14 ***
## SEMANA13               5.0444     0.5848   8.627  < 2e-16 ***
## SEMANA14               5.5155     0.5996   9.199  < 2e-16 ***
## SEMANA15               5.3791     0.6000   8.966  < 2e-16 ***
## SEMANA16               4.0457     0.5940   6.811 1.44e-11 ***
## SEMANA17               5.6406     0.5837   9.664  < 2e-16 ***
## SEMANA18               5.4263     0.5835   9.300  < 2e-16 ***
## SEMANA19               5.1669     0.5845   8.840  < 2e-16 ***
## SEMANA20               5.2001     0.5838   8.907  < 2e-16 ***
## SEMANA21               4.9700     0.5829   8.527  < 2e-16 ***
## SEMANA22               4.3113     0.5830   7.395 2.43e-13 ***
## SEMANA23               5.2293     0.5831   8.968  < 2e-16 ***
## SEMANA24               4.7994     0.5829   8.234 4.11e-16 ***
## SEMANA25               3.9004     0.5825   6.696 3.10e-11 ***
## SEMANA26               3.0130     0.5825   5.172 2.65e-07 ***
## SEMANA27               4.2672     0.5841   7.305 4.64e-13 ***
## SEMANA28               4.6929     0.5849   8.024 2.15e-15 ***
## SEMANA29               5.2255     0.5835   8.956  < 2e-16 ***
## SEMANA30               5.2697     0.5845   9.016  < 2e-16 ***
## SEMANA31               6.6748     0.5835  11.440  < 2e-16 ***
## SEMANA32               6.0952     0.5832  10.450  < 2e-16 ***
## SEMANA33               5.3466     0.5829   9.172  < 2e-16 ***
## SEMANA34               4.5076     0.5839   7.720 2.21e-14 ***
## SEMANA35               5.2800     0.5845   9.033  < 2e-16 ***
## SEMANA36               4.8264     0.5843   8.260 3.35e-16 ***
## SEMANA37               5.8110     0.5845   9.941  < 2e-16 ***
## SEMANA38               5.8040     0.5845   9.931  < 2e-16 ***
## SEMANA39               4.5949     0.5849   7.856 7.84e-15 ***
## SEMANA40               5.7074     0.5847   9.761  < 2e-16 ***
## SEMANA41               3.7066     0.5832   6.355 2.81e-10 ***
## SEMANA42               4.7340     0.5844   8.100 1.18e-15 ***
## SEMANA43               4.8565     0.5849   8.302 2.38e-16 ***
## SEMANA44               4.6772     0.5833   8.019 2.24e-15 ***
## SEMANA45               3.9045     0.5828   6.700 3.02e-11 ***
## SEMANA46               4.8501     0.5830   8.319  < 2e-16 ***
## SEMANA47               4.6346     0.5839   7.937 4.21e-15 ***
## SEMANA48               4.6285     0.5845   7.918 4.88e-15 ***
## SEMANA49               4.9809     0.5830   8.543  < 2e-16 ***
## SEMANA50               5.1014     0.5840   8.735  < 2e-16 ***
## SEMANA51               5.8308     0.5833   9.996  < 2e-16 ***
## SEMANA52               2.8152     0.5837   4.823 1.57e-06 ***
## SEMANA53               0.1876     0.4593   0.408  0.68301    
## DIA_SEMANAjueves      15.3319     0.5345  28.686  < 2e-16 ***
## DIA_SEMANAlunes       15.6617     0.5459  28.687  < 2e-16 ***
## DIA_SEMANAmartes      16.2709     0.5352  30.404  < 2e-16 ***
## DIA_SEMANAmiércoles   15.2980     0.5354  28.575  < 2e-16 ***
## DIA_SEMANAsábado      12.6912     0.5359  23.681  < 2e-16 ***
## DIA_SEMANAviernes     17.2568     0.6806  25.356  < 2e-16 ***
## Semana.Santa          -1.3413     0.4942  -2.714  0.00672 ** 
## Feriado              -10.1955     0.4632 -22.010  < 2e-16 ***
## Quincena              -0.3212     0.4165  -0.771  0.44070    
## Viernes_Desp_Qincena   0.1768     0.5527   0.320  0.74911    
## Viernes_Antes_Puente   0.4475     0.4718   0.949  0.34300    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.59 on 1397 degrees of freedom
## Multiple R-squared:  0.6585, Adjusted R-squared:  0.6431 
## F-statistic: 42.76 on 63 and 1397 DF,  p-value: < 2.2e-16
caret_lm_fit
## Linear Regression 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1316, 1314, 1314, 1316, 1315, 1314, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   15.92204  0.6314618  12.31408
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Calculo MSE y RMSE para los datos de entrenamiento

y_tr_pred_lm<-predict(caret_lm_fit,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_lm<-mean((Train_D_Dataset$Total_Accidentes-y_tr_pred_lm)^2) # calcula el mse de entrenamiento
RMSE_tr_lm = sqrt(mse_tr_lm)
mse_tr_lm
## [1] 232.3244
RMSE_tr_lm
## [1] 15.24219
y_test_pred_lm<-predict(caret_lm_fit,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_lm<-mean((Test_D_Dataset$Total_Accidentes-y_test_pred_lm)^2) # calcula el mse de entrenamiento
RMSE_test_lm = sqrt(mse_test_lm)
mse_test_lm
## [1] 254.0824
RMSE_test_lm
## [1] 15.93996

PRedicion en la muestra

plot_ly (data=Train_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_tr_pred_lm,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Grafica serie 2018

plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_lm,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Choques

trcntrl = trainControl(method="cv", number=10)
caret_lm_fit_ch = caret::train(Choque~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "lm", trControl = trcntrl,
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
summary(caret_lm_fit_ch)
## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.118  -7.832  -0.475   7.333  44.379 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          77.50376    0.31378 247.000  < 2e-16 ***
## SEMANA02              1.26133    0.44900   2.809  0.00504 ** 
## SEMANA03              2.64864    0.44966   5.890 4.82e-09 ***
## SEMANA04              3.00263    0.45001   6.672 3.62e-11 ***
## SEMANA05              3.26434    0.44984   7.257 6.56e-13 ***
## SEMANA06              3.35852    0.44980   7.467 1.44e-13 ***
## SEMANA07              4.31915    0.44999   9.598  < 2e-16 ***
## SEMANA08              3.89218    0.44968   8.656  < 2e-16 ***
## SEMANA09              3.72621    0.45006   8.279 2.86e-16 ***
## SEMANA10              4.15694    0.44980   9.242  < 2e-16 ***
## SEMANA11              4.07270    0.44948   9.061  < 2e-16 ***
## SEMANA12              3.37787    0.45540   7.417 2.07e-13 ***
## SEMANA13              3.68188    0.44993   8.183 6.16e-16 ***
## SEMANA14              4.27341    0.46136   9.263  < 2e-16 ***
## SEMANA15              4.38193    0.46164   9.492  < 2e-16 ***
## SEMANA16              3.19976    0.45704   7.001 3.93e-12 ***
## SEMANA17              4.54173    0.44911  10.113  < 2e-16 ***
## SEMANA18              4.21887    0.44893   9.398  < 2e-16 ***
## SEMANA19              4.10791    0.44975   9.134  < 2e-16 ***
## SEMANA20              4.06900    0.44920   9.058  < 2e-16 ***
## SEMANA21              3.51132    0.44847   7.829 9.62e-15 ***
## SEMANA22              3.24565    0.44858   7.235 7.63e-13 ***
## SEMANA23              3.76270    0.44865   8.387  < 2e-16 ***
## SEMANA24              3.64271    0.44848   8.122 9.95e-16 ***
## SEMANA25              2.97196    0.44820   6.631 4.76e-11 ***
## SEMANA26              2.58447    0.44822   5.766 9.98e-09 ***
## SEMANA27              3.37844    0.44946   7.517 1.00e-13 ***
## SEMANA28              3.46450    0.45002   7.698 2.59e-14 ***
## SEMANA29              3.92987    0.44893   8.754  < 2e-16 ***
## SEMANA30              4.07362    0.44975   9.058  < 2e-16 ***
## SEMANA31              4.94896    0.44895  11.023  < 2e-16 ***
## SEMANA32              4.66251    0.44878  10.389  < 2e-16 ***
## SEMANA33              3.96387    0.44852   8.838  < 2e-16 ***
## SEMANA34              3.39716    0.44928   7.561 7.19e-14 ***
## SEMANA35              3.39685    0.44978   7.552 7.69e-14 ***
## SEMANA36              3.79075    0.44962   8.431  < 2e-16 ***
## SEMANA37              4.23817    0.44976   9.423  < 2e-16 ***
## SEMANA38              4.36502    0.44971   9.706  < 2e-16 ***
## SEMANA39              3.41552    0.45002   7.590 5.83e-14 ***
## SEMANA40              4.51661    0.44992  10.039  < 2e-16 ***
## SEMANA41              3.17107    0.44876   7.066 2.50e-12 ***
## SEMANA42              3.49085    0.44967   7.763 1.59e-14 ***
## SEMANA43              3.71145    0.45008   8.246 3.73e-16 ***
## SEMANA44              3.91307    0.44881   8.719  < 2e-16 ***
## SEMANA45              2.95727    0.44842   6.595 6.02e-11 ***
## SEMANA46              3.90202    0.44858   8.699  < 2e-16 ***
## SEMANA47              3.49686    0.44928   7.783 1.37e-14 ***
## SEMANA48              3.88178    0.44978   8.630  < 2e-16 ***
## SEMANA49              4.22457    0.44860   9.417  < 2e-16 ***
## SEMANA50              4.45513    0.44937   9.914  < 2e-16 ***
## SEMANA51              4.60465    0.44882  10.259  < 2e-16 ***
## SEMANA52              1.99434    0.44915   4.440 9.69e-06 ***
## SEMANA53              0.41755    0.35344   1.181  0.23765    
## DIA_SEMANAjueves     12.29178    0.41124  29.889  < 2e-16 ***
## DIA_SEMANAlunes      12.86427    0.42008  30.624  < 2e-16 ***
## DIA_SEMANAmartes     13.56922    0.41178  32.953  < 2e-16 ***
## DIA_SEMANAmiércoles  12.56514    0.41193  30.503  < 2e-16 ***
## DIA_SEMANAsábado     10.60223    0.41236  25.711  < 2e-16 ***
## DIA_SEMANAviernes    14.50901    0.52367  27.706  < 2e-16 ***
## Semana.Santa         -0.89104    0.38024  -2.343  0.01925 *  
## Feriado              -8.51712    0.35643 -23.896  < 2e-16 ***
## Quincena             -0.56810    0.32046  -1.773  0.07648 .  
## Viernes_Desp_Qincena  0.01274    0.42529   0.030  0.97611    
## Viernes_Antes_Puente  0.47114    0.36301   1.298  0.19455    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.99 on 1397 degrees of freedom
## Multiple R-squared:  0.6841, Adjusted R-squared:  0.6698 
## F-statistic: 48.01 on 63 and 1397 DF,  p-value: < 2.2e-16
caret_lm_fit_ch
## Linear Regression 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1317, 1314, 1314, 1315, 1316, 1314, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   12.33502  0.6547476  9.712046
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
y_tr_pred_lm_ch<-predict(caret_lm_fit_ch,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_lm_ch<-mean((Train_D_Dataset$Choque-y_tr_pred_lm_ch)^2) # calcula el mse de entrenamiento
RMSE_tr_lm_ch = sqrt(mse_tr_lm_ch)
mse_tr_lm_ch
## [1] 137.5459
RMSE_tr_lm_ch
## [1] 11.728
y_test_pred_lm_ch<-predict(caret_lm_fit_ch,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_lm_ch<-mean((Test_D_Dataset$Choque-y_test_pred_lm_ch)^2) # calcula el mse de entrenamiento
RMSE_test_lm_ch = sqrt(mse_test_lm_ch)
mse_test_lm_ch
## [1] 149.9768
RMSE_test_lm_ch
## [1] 12.2465

PRedicion en la muestra

plot_ly (data=Train_D_Dataset,
         x = ~FECHA,
         y = ~Choque,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_tr_pred_lm_ch,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

Grafica serie 2018

plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Choque,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_lm_ch,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

3. MODELO LINEAL GENERALIZADO

glm_fit<-glm(Total_Accidentes~SEMANA+DIA_SEMANA+DIA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset, family = "poisson")
summary(glm_fit)
## 
## Call:
## glm(formula = Total_Accidentes ~ SEMANA + DIA_SEMANA + DIA + 
##     Semana.Santa + Feriado + Quincena + Viernes_Desp_Qincena + 
##     Viernes_Antes_Puente, family = "poisson", data = Train_D_Dataset)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.6815  -0.9401  -0.0613   0.8115   5.9142  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           4.052e+00  2.712e-02 149.378  < 2e-16 ***
## SEMANA02              8.972e-02  3.065e-02   2.927  0.00342 ** 
## SEMANA03              2.439e-01  3.041e-02   8.019 1.07e-15 ***
## SEMANA04              2.780e-01  2.995e-02   9.282  < 2e-16 ***
## SEMANA05              3.272e-01  2.860e-02  11.438  < 2e-16 ***
## SEMANA06              3.515e-01  2.832e-02  12.411  < 2e-16 ***
## SEMANA07              3.583e-01  2.951e-02  12.140  < 2e-16 ***
## SEMANA08              3.392e-01  2.981e-02  11.378  < 2e-16 ***
## SEMANA09              3.517e-01  2.891e-02  12.166  < 2e-16 ***
## SEMANA10              4.004e-01  2.805e-02  14.275  < 2e-16 ***
## SEMANA11              3.785e-01  2.947e-02  12.842  < 2e-16 ***
## SEMANA12              3.211e-01  3.063e-02  10.486  < 2e-16 ***
## SEMANA13              3.820e-01  2.918e-02  13.093  < 2e-16 ***
## SEMANA14              3.992e-01  2.867e-02  13.927  < 2e-16 ***
## SEMANA15              3.768e-01  2.979e-02  12.647  < 2e-16 ***
## SEMANA16              2.818e-01  3.076e-02   9.160  < 2e-16 ***
## SEMANA17              4.068e-01  2.930e-02  13.883  < 2e-16 ***
## SEMANA18              4.040e-01  2.811e-02  14.374  < 2e-16 ***
## SEMANA19              3.713e-01  2.867e-02  12.952  < 2e-16 ***
## SEMANA20              3.524e-01  2.970e-02  11.867  < 2e-16 ***
## SEMANA21              3.661e-01  2.968e-02  12.336  < 2e-16 ***
## SEMANA22              3.429e-01  2.890e-02  11.864  < 2e-16 ***
## SEMANA23              3.764e-01  2.813e-02  13.383  < 2e-16 ***
## SEMANA24              3.425e-01  2.956e-02  11.588  < 2e-16 ***
## SEMANA25              2.804e-01  3.030e-02   9.256  < 2e-16 ***
## SEMANA26              2.480e-01  2.986e-02   8.307  < 2e-16 ***
## SEMANA27              3.247e-01  2.855e-02  11.373  < 2e-16 ***
## SEMANA28              3.362e-01  2.919e-02  11.515  < 2e-16 ***
## SEMANA29              3.545e-01  2.998e-02  11.825  < 2e-16 ***
## SEMANA30              3.929e-01  2.942e-02  13.356  < 2e-16 ***
## SEMANA31              4.705e-01  2.775e-02  16.956  < 2e-16 ***
## SEMANA32              4.352e-01  2.827e-02  15.394  < 2e-16 ***
## SEMANA33              3.644e-01  2.958e-02  12.320  < 2e-16 ***
## SEMANA34              3.359e-01  3.013e-02  11.148  < 2e-16 ***
## SEMANA35              3.997e-01  2.872e-02  13.918  < 2e-16 ***
## SEMANA36              3.519e-01  2.806e-02  12.543  < 2e-16 ***
## SEMANA37              4.033e-01  2.905e-02  13.884  < 2e-16 ***
## SEMANA38              3.913e-01  2.965e-02  13.200  < 2e-16 ***
## SEMANA39              3.497e-01  2.941e-02  11.893  < 2e-16 ***
## SEMANA40              4.119e-01  2.779e-02  14.819  < 2e-16 ***
## SEMANA41              2.748e-01  2.923e-02   9.403  < 2e-16 ***
## SEMANA42              3.301e-01  3.025e-02  10.911  < 2e-16 ***
## SEMANA43              3.611e-01  2.960e-02  12.201  < 2e-16 ***
## SEMANA44              3.608e-01  2.844e-02  12.686  < 2e-16 ***
## SEMANA45              2.949e-01  2.892e-02  10.198  < 2e-16 ***
## SEMANA46              3.405e-01  2.971e-02  11.460  < 2e-16 ***
## SEMANA47              3.351e-01  3.008e-02  11.142  < 2e-16 ***
## SEMANA48              3.542e-01  2.901e-02  12.211  < 2e-16 ***
## SEMANA49              3.637e-01  2.812e-02  12.936  < 2e-16 ***
## SEMANA50              3.649e-01  2.943e-02  12.401  < 2e-16 ***
## SEMANA51              3.893e-01  2.967e-02  13.119  < 2e-16 ***
## SEMANA52              2.298e-01  3.041e-02   7.559 4.08e-14 ***
## SEMANA53              4.368e-02  4.806e-02   0.909  0.36343    
## DIA_SEMANAjueves      4.468e-01  1.007e-02  44.387  < 2e-16 ***
## DIA_SEMANAlunes       4.601e-01  1.030e-02  44.647  < 2e-16 ***
## DIA_SEMANAmartes      4.675e-01  1.002e-02  46.674  < 2e-16 ***
## DIA_SEMANAmiércoles   4.453e-01  1.005e-02  44.304  < 2e-16 ***
## DIA_SEMANAsábado      3.836e-01  1.018e-02  37.682  < 2e-16 ***
## DIA_SEMANAviernes     4.978e-01  1.228e-02  40.533  < 2e-16 ***
## DIA02                -6.085e-02  1.942e-02  -3.134  0.00172 ** 
## DIA03                -2.768e-02  1.942e-02  -1.425  0.15407    
## DIA04                -3.967e-02  1.962e-02  -2.022  0.04317 *  
## DIA05                 3.952e-03  1.970e-02   0.201  0.84104    
## DIA06                -1.417e-02  2.015e-02  -0.703  0.48188    
## DIA07                 2.947e-02  2.034e-02   1.449  0.14739    
## DIA08                -2.281e-02  2.095e-02  -1.089  0.27608    
## DIA09                -4.209e-02  2.113e-02  -1.992  0.04634 *  
## DIA10                 7.543e-04  2.114e-02   0.036  0.97154    
## DIA11                -1.591e-02  2.149e-02  -0.740  0.45924    
## DIA12                -3.117e-02  2.182e-02  -1.428  0.15315    
## DIA13                -3.675e-02  2.197e-02  -1.673  0.09438 .  
## DIA14                -1.467e-02  2.190e-02  -0.670  0.50302    
## DIA15                 3.408e-03  3.421e-02   0.100  0.92064    
## DIA16                 2.433e-02  2.173e-02   1.120  0.26276    
## DIA17                 2.914e-02  2.172e-02   1.342  0.17975    
## DIA18                 3.754e-02  2.171e-02   1.729  0.08383 .  
## DIA19                -6.384e-03  2.182e-02  -0.293  0.76986    
## DIA20                -6.073e-05  2.174e-02  -0.003  0.99777    
## DIA21                 5.211e-03  2.139e-02   0.244  0.80752    
## DIA22                -3.261e-02  2.142e-02  -1.522  0.12792    
## DIA23                -1.240e-02  2.113e-02  -0.587  0.55721    
## DIA24                -4.334e-02  2.091e-02  -2.072  0.03823 *  
## DIA25                -2.404e-02  2.068e-02  -1.162  0.24509    
## DIA26                -4.959e-02  2.052e-02  -2.417  0.01564 *  
## DIA27                -4.147e-02  2.000e-02  -2.073  0.03818 *  
## DIA28                -1.227e-02  1.965e-02  -0.624  0.53231    
## DIA29                -3.712e-02  2.001e-02  -1.856  0.06352 .  
## DIA30                -4.994e-02  2.248e-02  -2.221  0.02632 *  
## DIA31                -7.889e-02  3.597e-02  -2.193  0.02832 *  
## Semana.Santa         -1.017e-01  2.094e-02  -4.857 1.19e-06 ***
## Feriado              -5.012e-01  1.504e-02 -33.315  < 2e-16 ***
## Quincena              2.597e-03  2.629e-02   0.099  0.92131    
## Viernes_Desp_Qincena -1.226e-02  1.351e-02  -0.907  0.36420    
## Viernes_Antes_Puente  2.202e-02  1.559e-02   1.412  0.15788    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 9170.5  on 1460  degrees of freedom
## Residual deviance: 2791.2  on 1367  degrees of freedom
## AIC: 12564
## 
## Number of Fisher Scoring iterations: 4
glm_fit
## 
## Call:  glm(formula = Total_Accidentes ~ SEMANA + DIA_SEMANA + DIA + 
##     Semana.Santa + Feriado + Quincena + Viernes_Desp_Qincena + 
##     Viernes_Antes_Puente, family = "poisson", data = Train_D_Dataset)
## 
## Coefficients:
##          (Intercept)              SEMANA02              SEMANA03  
##            4.052e+00             8.972e-02             2.439e-01  
##             SEMANA04              SEMANA05              SEMANA06  
##            2.780e-01             3.272e-01             3.515e-01  
##             SEMANA07              SEMANA08              SEMANA09  
##            3.583e-01             3.392e-01             3.517e-01  
##             SEMANA10              SEMANA11              SEMANA12  
##            4.004e-01             3.785e-01             3.211e-01  
##             SEMANA13              SEMANA14              SEMANA15  
##            3.820e-01             3.992e-01             3.768e-01  
##             SEMANA16              SEMANA17              SEMANA18  
##            2.818e-01             4.068e-01             4.040e-01  
##             SEMANA19              SEMANA20              SEMANA21  
##            3.713e-01             3.524e-01             3.661e-01  
##             SEMANA22              SEMANA23              SEMANA24  
##            3.429e-01             3.764e-01             3.425e-01  
##             SEMANA25              SEMANA26              SEMANA27  
##            2.804e-01             2.480e-01             3.247e-01  
##             SEMANA28              SEMANA29              SEMANA30  
##            3.362e-01             3.545e-01             3.929e-01  
##             SEMANA31              SEMANA32              SEMANA33  
##            4.705e-01             4.352e-01             3.644e-01  
##             SEMANA34              SEMANA35              SEMANA36  
##            3.359e-01             3.997e-01             3.519e-01  
##             SEMANA37              SEMANA38              SEMANA39  
##            4.033e-01             3.913e-01             3.497e-01  
##             SEMANA40              SEMANA41              SEMANA42  
##            4.119e-01             2.748e-01             3.301e-01  
##             SEMANA43              SEMANA44              SEMANA45  
##            3.611e-01             3.608e-01             2.949e-01  
##             SEMANA46              SEMANA47              SEMANA48  
##            3.405e-01             3.351e-01             3.542e-01  
##             SEMANA49              SEMANA50              SEMANA51  
##            3.637e-01             3.649e-01             3.893e-01  
##             SEMANA52              SEMANA53      DIA_SEMANAjueves  
##            2.298e-01             4.368e-02             4.468e-01  
##      DIA_SEMANAlunes      DIA_SEMANAmartes   DIA_SEMANAmiércoles  
##            4.601e-01             4.675e-01             4.453e-01  
##     DIA_SEMANAsábado     DIA_SEMANAviernes                 DIA02  
##            3.836e-01             4.978e-01            -6.085e-02  
##                DIA03                 DIA04                 DIA05  
##           -2.768e-02            -3.967e-02             3.952e-03  
##                DIA06                 DIA07                 DIA08  
##           -1.417e-02             2.947e-02            -2.281e-02  
##                DIA09                 DIA10                 DIA11  
##           -4.209e-02             7.542e-04            -1.591e-02  
##                DIA12                 DIA13                 DIA14  
##           -3.117e-02            -3.675e-02            -1.467e-02  
##                DIA15                 DIA16                 DIA17  
##            3.408e-03             2.433e-02             2.914e-02  
##                DIA18                 DIA19                 DIA20  
##            3.754e-02            -6.384e-03            -6.073e-05  
##                DIA21                 DIA22                 DIA23  
##            5.211e-03            -3.261e-02            -1.240e-02  
##                DIA24                 DIA25                 DIA26  
##           -4.334e-02            -2.404e-02            -4.959e-02  
##                DIA27                 DIA28                 DIA29  
##           -4.147e-02            -1.227e-02            -3.712e-02  
##                DIA30                 DIA31          Semana.Santa  
##           -4.994e-02            -7.889e-02            -1.017e-01  
##              Feriado              Quincena  Viernes_Desp_Qincena  
##           -5.012e-01             2.597e-03            -1.226e-02  
## Viernes_Antes_Puente  
##            2.202e-02  
## 
## Degrees of Freedom: 1460 Total (i.e. Null);  1367 Residual
## Null Deviance:       9171 
## Residual Deviance: 2791  AIC: 12560
y_tr_pred_glm<-predict(glm_fit,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","DIA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")],type="response")
mse_tr_glm<-mean((Train_D_Dataset$Total_Accidentes-y_tr_pred_glm)^2) # calcula el mse de entrenamiento
RMSE_tr_glm = sqrt(mse_tr_glm)
mse_tr_glm
## [1] 211.596
RMSE_tr_glm
## [1] 14.54634
y_test_pred_glm<-predict(glm_fit,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","DIA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")],type="response")
mse_test_glm<-mean((Train_D_Dataset$Total_Accidentes-y_test_pred_glm)^2) # calcula el mse de entrenamiento
## Warning in Train_D_Dataset$Total_Accidentes - y_test_pred_glm: longitud de
## objeto mayor no es múltiplo de la longitud de uno menor
RMSE_test_glm = sqrt(mse_test_glm)
mse_test_glm
## [1] 1101.145
RMSE_test_glm
## [1] 33.18351
plot_ly (data=Train_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_tr_pred_glm,
            name='Modelo glm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_glm,
            name='Modelo glm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

4. ARBOLES DE REGRESION

trcntrl = trainControl(method="cv", number=10)
caret_tree_fit = caret::train(Total_Accidentes~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente,data=Train_D_Dataset,
                              method = "rpart", trControl = trcntrl,
                      parms = list(split = "gini"),
                      preProcess=c("center", "scale"),
                      tuneLength = 10)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.
caret_tree_fit
## CART 
## 
## 1461 samples
##    7 predictor
## 
## Pre-processing: centered (63), scaled (63) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1316, 1316, 1315, 1315, 1316, 1314, ... 
## Resampling results across tuning parameters:
## 
##   cp            RMSE      Rsquared   MAE     
##   6.424688e-05  16.80676  0.5855434  12.84698
##   1.890736e-04  16.80781  0.5854719  12.84759
##   2.089483e-04  16.80854  0.5854612  12.84841
##   6.408283e-04  16.80061  0.5859644  12.85642
##   8.458601e-04  16.76004  0.5879283  12.85201
##   1.709459e-03  16.79462  0.5866256  12.85094
##   2.310304e-03  16.81679  0.5856843  12.86666
##   3.426222e-03  16.82144  0.5855254  12.86090
##   6.807466e-02  19.76936  0.4057811  15.24499
##   1.793061e-01  25.37186  0.1288797  19.91892
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.0008458601.
y_tr_pred_tree<-predict(caret_tree_fit,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_tree<-mean((Train_D_Dataset$Total_Accidentes-y_tr_pred_tree)^2) # calcula el mse de entrenamiento
RMSE_tr_tree = sqrt(mse_tr_tree)
mse_tr_tree
## [1] 275.4052
RMSE_tr_tree
## [1] 16.59534
y_test_pred_tree<-predict(caret_tree_fit,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_tree<-mean((Test_D_Dataset$Total_Accidentes-y_test_pred_tree)^2) # calcula el mse de entrenamiento
RMSE_test_tree = sqrt(mse_test_tree)
mse_test_tree
## [1] 269.4619
RMSE_test_tree
## [1] 16.41529
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_tree,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))
plot_ly (data=Train_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_tr_pred_tree,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

5. BOSQUE ALEATORIO

trcntrl = trainControl(method="cv", number=10)
caret_rf_fit = caret::train(Total_Accidentes~SEMANA+DIA_SEMANA+Semana.Santa+Feriado+Quincena+Viernes_Desp_Qincena+Viernes_Antes_Puente, data=Train_D_Dataset,
                      method = "rf", trControl = trcntrl,
                      prox=TRUE,allowParallel=TRUE)
summary(caret_rf_fit)
##                 Length  Class      Mode     
## call                  6 -none-     call     
## type                  1 -none-     character
## predicted          1461 -none-     numeric  
## mse                 500 -none-     numeric  
## rsq                 500 -none-     numeric  
## oob.times          1461 -none-     numeric  
## importance           63 -none-     numeric  
## importanceSD          0 -none-     NULL     
## localImportance       0 -none-     NULL     
## proximity       2134521 -none-     numeric  
## ntree                 1 -none-     numeric  
## mtry                  1 -none-     numeric  
## forest               11 -none-     list     
## coefs                 0 -none-     NULL     
## y                  1461 -none-     numeric  
## test                  0 -none-     NULL     
## inbag                 0 -none-     NULL     
## xNames               63 -none-     character
## problemType           1 -none-     character
## tuneValue             1 data.frame list     
## obsLevels             1 -none-     logical  
## param                 2 -none-     list
caret_rf_fit
## Random Forest 
## 
## 1461 samples
##    7 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1315, 1314, 1313, 1316, 1315, 1315, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE     
##    2    22.59901  0.3847643  17.97227
##   32    17.81764  0.5517189  13.69226
##   63    17.95390  0.5488941  13.79388
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 32.
plot(caret_rf_fit)

y_tr_pred_rf<-predict(caret_rf_fit,Train_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_tr_rf<-mean((Train_D_Dataset$Total_Accidentes-y_tr_pred_rf)^2) # calcula el mse de entrenamiento
RMSE_tr_rf = sqrt(mse_tr_rf)
mse_tr_rf
## [1] 152.6926
RMSE_tr_rf
## [1] 12.35689
y_test_pred_rf<-predict(caret_rf_fit,Test_D_Dataset[,c("SEMANA","DIA_SEMANA","Semana.Santa","Feriado","Quincena","Viernes_Desp_Qincena","Viernes_Antes_Puente")])
mse_test_rf<-mean((Test_D_Dataset$Total_Accidentes-y_test_pred_rf)^2) # calcula el mse de entrenamiento
RMSE_test_rf = sqrt(mse_test_rf)
mse_test_rf
## [1] 284.6565
RMSE_test_rf
## [1] 16.87177
plot_ly (data=Test_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_test_pred_rf,
            name='Modelo lm',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))
plot_ly (data=Train_D_Dataset,
         x = ~FECHA,
         y = ~Total_Accidentes,
         type = "scatter" ,mode = "lines",
         name='Real',
         line=list(width=1,color='rgb(205, 12, 24)'))%>%
  add_trace(y= ~y_tr_pred_rf,
            name='Modelo rf',
            line=list(width=1,color='rgb(22, 96, 167)'))%>%
  layout(title='Total Accidentes',
         xaxis=list(title="Fecha"),
         yaxis=list(title="Accidentes"),
         legend = list(x = 0.75, y = 0.9))

ELECCION DEL MODELO